I am working on a task where my source is AWS RDS - SQL Server and my target is Azure SQL Server.
There's a table with 80M records in my source that needs to be merged with my target table.
This merging will happen every 15 mins and based on the business key, I need to -
Update the target table if the key is updated in the source table.
Insert a new key into the target table.
Mark isDeleted as true in the target if the key is no more present in the source.
IMP Note - The source row is hard-deleted and no history is maintained.
Since this merging happens every 15 mins and the source table is pretty big, I use lastUpdated column to select only limited records in the source query of my merge query.
With this, I am able to perfectly handle the "upsert" scenario, but on delete, it is deleting all the records from the target which is not desirable.
I have tried the below option -
Read the entire source table in a temp_table every 15 mins and then perform merge from temp_table to the target table. But this is very costly in terms of processing and time.
Is there any better way to handle this scenario? I am happy to share more information as needed.
I think you can solve the problem by adding new column called SyncStamp, the idea is, we update or insert the same value for SyncStamp, So the other rows that have not this value should be updated as IsDeleted.
I prefer to get the actual timestamp for SyncStamp but you can choose random numbers.
--get timestamp
Declare #SyncStamp bigint = (SELECT DATEDIFF_BIG(Second, '1970-01-01 00:00:00', GETUTCDATE()))
MERGE TargetTable AS Target
USING SourceTable AS Source
ON Source.BusinessKey = Target.BusinessKey
-- For Inserts
WHEN NOT MATCHED BY Target THEN
INSERT (ProductID,ProductName, SyncStamp)
VALUES (Source.ProductID,Source.ProductName, #SyncStamp)
-- For Updates
WHEN MATCHED THEN UPDATE SET
Target.ProductName = Source.ProductName,
Target.SyncStamp = #SyncStamp;
--Update isDeleted
UPDATE TargetTable
SET IsDeleted= 1
Where IsDeleted=0 and SyncStamp <> #SyncStamp
Related
I have an upsert implemented with the MERGE statement in a stored procedure in Microsoft SQL Server 2017 Standard Edition.
The problem is I'm getting multiple inserts when I make concurrent calls to the stored procedure. I'm able to reproduce the behavior using JMeter with lots of concurrent threads. JMeter hits a Java web app which calls the stored procedure using JDBC. After deleting all the rows and running JMeter, it often creates just 1 row, but sometimes it creates 2 or more rows. I think I've seen it create up to 6 rows.
I thought this was impossible using MERGE. The answers to this question all say a transaction is unnecessary:
Is neccessary to encapsulate a single merge statement (with insert, delete and update) in a transaction?
Basically, I want the table to store the max size (LQ_SIZE) value for every day, along with the time (LQ_TIMESTAMP) when that max size occured. I'm doing two slightly unusual things in my upsert. 1. I'm matching on the timestamps cast to a date, so I'm inserting or updating the row for the day ignoring the time. 2. My WHEN MATCHED clause has an AND condition so it only updates the row if the new size is greater than the current size.
Here's my table and stored procedure with MERGE statement:
CREATE TABLE LOG_QUEUE_SIZE (
LQ_APP_ID SMALLINT NOT NULL,
LQ_TIMESTAMP DATETIME2,
LQ_SIZE INT
);
GO
CREATE PROCEDURE LOG_QUEUE_SIZE (
#P_TIMESTAMP DATETIME2,
#P_APP_ID SMALLINT,
#P_QUEUE_SIZE INT
)
AS
BEGIN
-- INSERT or UPDATE the max LQ_QUEUE_SIZE for today in the LOG_QUEUE_SIZE table
MERGE
LOG_QUEUE_SIZE target
USING
(SELECT #P_APP_ID NEW_APP_ID, #P_TIMESTAMP NEW_TIMESTAMP, #P_QUEUE_SIZE NEW_SIZE) source
ON
LQ_APP_ID=NEW_APP_ID
AND CAST(NEW_TIMESTAMP AS DATE) = CAST(LQ_TIMESTAMP AS DATE) -- Truncate the timestamp to the day
WHEN MATCHED AND NEW_SIZE > LQ_SIZE THEN -- Only update if we have a new max size for today
UPDATE
SET
LQ_TIMESTAMP = NEW_TIMESTAMP,
LQ_SIZE = NEW_SIZE
WHEN NOT MATCHED BY TARGET THEN -- Otherwise insert the new size
INSERT
(LQ_APP_ID,
LQ_TIMESTAMP,
LQ_SIZE)
VALUES
(NEW_APP_ID,
NEW_TIMESTAMP,
NEW_SIZE);
END
Using a transaction (with BEGIN TRAN...COMMIT around the MERGE) appears to prevent the problem, but the performance is terrible.
Why am I getting multiple inserts if MERGE is atomic? How can I prevent it?
MERGE, results in multiple SQL statements, that means it can result in possible concurrency conflicts. You should implement locking:
MERGE dbo.TableName WITH (HOLDLOCK) AS target
USING ... AS source ...;
https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/
I want to track the update changes in a table via a trigger:
CREATE TABLE dbo.TrackTable(...columns same as target table)
GO
CREATE TRIGGER dboTrackTable
ON dbo.TargetTable
AFTER UPDATE
AS
INSERT INTO dbo.TrackTable (...columns)
SELECT (...columns)
FROM Inserted
However in real production some of the update queries select rows with vague conditions and update them all regardless of whether they are actually changed, like
UPDATE Targettable
SET customer_type = 'VIP'
WHERE 1 = 1
--or is_obsolete = 0 or register_date < '20160101' something
But due to table size and to analyze, I only want to choose those actually modified data for tracking. How to achieve this goal?
My track table has many columns (so I do not prefer checking inserted and deleted column one by one) but it seldom changes structure.
I guess the following code will be useful.
CREATE TABLE dbo.TrackTable(...columns same as target table)
GO
CREATE TRIGGER dboTrackTable
ON dbo.TargetTable
AFTER UPDATE
AS
INSERT INTO dbo.TrackTable (...columns)
SELECT *
FROM Inserted
EXCEPT
SELECT *
FROM Deleted
I realize this post is a couple months old now, but for anyone looking for a well-rounded answer:
To exit the trigger if no rows were affected on SQL Server 2016 and up, Microsoft recommends using the built-in ROWCOUNT_BIG() function in the Optimizing DML Triggers section of the Create Trigger documentation.
Usage:
IF ROWCOUNT_BIG() = 0
RETURN;
To ensure you are excluding rows that were not changed, you'll need to do a compare of the inserted and deleted tables inside the trigger. Taking your example code:
INSERT INTO dbo.TrackTable (...columns)
SELECT (...columns)
FROM Inserted i
INNER JOIN deleted d
ON d.[SomePrimaryKeyCol]=i.[SomePrimaryKeyCol] AND
i.customer_type<>d.customer_type
Microsoft documentation and w3schools are great resources for learning how to leverage various types of queries and trigger best practices.
Prevent trigger from doing anything if no rows changed.
Writing-triggers-the-right-way
CREATE TRIGGER the_trigger on dbo.Data
after update
as
begin
if ##ROWCOUNT = 0
return
set nocount on
/* Some Code Here */
end
Get a list of rows that changed:
CREATE TRIGGER the_trigger on dbo.data
AFTER UPDATE
AS
SELECT * from inserted
Previous stack overflow on triggers
#anna - as per #Oded's answer, when an update is performed, the rows are in the deleted table with the old information, and the inserted table with the new information –
I'm trying to merge a temporary table, populated with data from a c# application with a table in my SQL database.
Having read through some articles it seems that SQL Merge should be the most appropriate option and it works great when I'm updating or deleting groups of entries from the database.
My problem is that I'm unable to add a new row of data where the foreign keys don't match what is already in the database table, but instead it is just removing the rows in the database and replacing the new ones in the temporary table.
I have two foreign keys referenced in the table FirstDBTableID and SecondDBTableID.
This is my SQL so far:
ALTER PROCEDURE [dbo].[SP_UpdateTable]
#TempTable as TempTableType READONLY
AS
BEGIN
MERGE dbo.Table AS target
USING #TempTable AS source ON target.FirstDBTableID = source.FirstDBTableID
AND target.SecondDBTableID = source.SecondDBTableID
WHEN MATCHED THEN
UPDATE SET target.Provider = source.Provider, target.Value = source.Value, target.Quantity = source.Quantity
-- value doesn't exist in the target table, so add it
WHEN NOT MATCHED BY TARGET THEN
INSERT (FirstDBTableID, SecondDBTableID, Provider, Value, Quantity)
VALUES (source.FirstDBTableID, source.SecondDBTableID, source.Provider, source.Value, source.Quantity)
-- value doesn't exist in the source table, so delete from target
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
END
Have a missed statement? Maybe SQL Merge isn't going to work here? If I haven't made it clear please ask me questions.
Thanks in advance.
I have a data warehouse and a staging DB. The staging gets a new file everyday on an ftp which gets loaded on the staging DB. It is then inserted/updated/deleted in the DB warehouse. However, the staging file has only the last 5 days' records, which are on a rolling basis. That is, from 8/8 to 8/13 would be today, but tomorrow the file would have data from 8/9 to 8/14, while the DB warehouse has all the history.
When I use
WHEN NOT MATCHED BY SOURCE THEN DELETE
it will delete all the records from DBwarehouse which do not match the staging. This would wipe out all the history. I want to control the script to go back only 5 days back and check if it does not match the source. Here is the query:
MERGE INTO
[x].[y].[z] AS Target
USING [a].[y].[z]AS Source
ON target.[PROBLEM_ID] =source.[PROBLEM_ID]
WHEN MATCHED THEN
UPDATE SET
Target.[CUSTNO] = Source.[CUSTNO],
Target.[SALESID] = Source.[SALESID],
Target.[PCODE] = Source.[PCODE]
WHEN NOT MATCHED BY TARGET THEN
INSERT
([CUSTNO]
,[SALESID]
,[PCODE])
VALUES
(source.[CUSTNO]
,source.[SALESID]
,source.[PCODE])
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
;
Can I get a constraint on the delete statement to go only 5 days back on the DB warehouse? If yes, please help me with the constraint code.
I haven't tried this, but the documentation says that you can add an "AND" clause to "WHEN NOT MATCHED BY SOURCE". This would let you do this:
WHEN NOT MATCHED BY SOURCE AND Your_Date_Field > DateAdd(Day,-5,GetDate())
THEN DELETE;
Note that if your dates includes times you might need to truncate the time before you compare the dates.
Here's basically what you want to do. You use a common table expression to build up a more complex set to merge against. You can also do an "and" on when matched and when not matched, but I find it cleaner to start first with a data set built to purpose in a cte.
Peace
Katherine
with [merge_helper] ([custno], [salesid], [pcode])
as (select [source].[id],
[source].[custno],
[source].[salesid],
[source].[pcode]
from [a].[y].[z] as [source]
left join [x].[y].[z] as [target]
on [target].[id] = [source].[id]
union
select [target].[id],
[target].[custno],
[target].[salesid],
[target].[pcode]
from [x].[y].[z] as [target]
where [target].[id] not in (select [id]
from [source]))
merge into [x].[y].[z] as target
using [merge_helper] as source
on target.[id] = source.[id]
when matched then
update set target.[custno] = source.[custno],
target.[salesid] = source.[salesid],
target.[pcode] = source.[pcode]
when not matched by target then
insert ([custno],
[salesid],
[pcode])
values (source.[custno],
source.[salesid],
source.[pcode])
when not matched by source then
delete;
I have a sql statement to insert data into a table for archiving, but I need a merge statement to run on a monthyl basis to update the new table(2) with any data that changed in the old table(1) that should now be moved into archive.
Part of the issue is to remove the moved data from the old table. My insert is not doing that, but I need to have it to where the saved data is purged from the original table.
Is there a single sql statement that will move data out of one table into another in this way? Or does it need to be a two step operation?
the initial statement moved data depending on age and a few other relative factors.
insert is:
INSERT /*+ append */
INTO tab1
SELECT *
FROM tab2
WHERE (Postingdate < TO_DATE ('2001/07/01', 'yyyy/mm/dd')
OR jobname IS NULL)
AND STATUS <> '45';
All help appreciated...
The merge statement will let you do this in one statement by adding a delete statement in the update clause. See Oracle Documentation on Merge.
I think you should try this with a partition table. My idea is to create table which have range partition on date:
create table(id number primary key,name varchar,J_date date )
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(sysdate-30)),
partition by range(J_date)(PARTITION one_mnth VALUES LESS THAN(maxvalue)));
then move that partition in to another table and and truncate that partition