How to synchronize two tables in SSIS - sql

I have a scenario where i need to synchronize two tables in SSIS
Table A is in DATABASE A and TABLE B is in DATABASE B. Both tables have same schema. I need to have a SSIS package that Synchronize TABLE A with TABLE B in Such a way That
1. It inserts all the records That Exist in Table A into Table B
AND
2. Update TABLE B if Same "Key" exsit in Both but Updated records in Table A
For Example Table A and B both Contains Key = 123 both Few Columns in Table A has been Updated.
I am thinking about using Merge Joins but that helps with only insertion of New records. How i can manage to implement UPDATE thing as well

1.It inserts all the records That Exist in Table A into Table B
Use a lookup transformation .Source will be Table A and Lookup will be Table B .Map the common columns in both the table and select those columns which you need for insertion.After lookup use OLEDB destination and the map the columns coming from the lookup and insert it into Table B
2.Update TABLE B if Same "Key" exsit in Both but Updated records in Table A
Same logic as above .Use lookup and instead of OLEDB Destination use OLEDB Command and then write the update sql .
Update TableB
Set col1=?,col2=?....
In the column mapping map the columns coming out of the lookup
Check out this article
Checking to see if a record exists and if so update else insert
Using Merge :
MERGE TableB b
USING TableA a
ON b.Key = a.Key
WHEN MATCHED AND b.Col1<>a.Col1 THEN
UPDATE
SET b.Col1 = a.Col1
WHEN NOT MATCHED BY TARGET THEN
INSERT (Col1, Col2, col3)
VALUES (a.Col1, a.Col2,a.Col3);
You can execute the Merge SQL in Execute SQL Task in Control Flow
Update : The Lookup transformation tries to perform an equi-join between values in the transformation input and values in the reference dataset.
You can just need to have one Data Flow Task .
Diagram
When the target table data does not have a matching value in the source table then lookup will redirect the target rows to the oledb destination which inserts the Data into source table( Lookup No Match Output)
When the target table rows matches for the business key with the source table then matched rows will be sent to the Oledb Command and using the Update SQL ,the all the target rows from the lookup will be updated in the source table .
This is just an overview .There is a problem with the above design as when the rows matches irrespective of any change in the columns the source table will be updated .So kindly refer the above article or try for search for SCD component in ssis
Update 2:
MERGE TableB b
USING TableA a
ON b.Key = a.Key
WHEN MATCHED THEN
UPDATE
SET b.Col1 = a.Col1
WHEN NOT MATCHED BY TARGET AND a.IsReady=1 THEN --isReady bit data type
INSERT (Col1, Col2, col3)
VALUES (a.Col1, a.Col2,a.Col3);

Related

populate a table from another table including logging

I'm trying to populate a table from another table including logging.
For example there 2 tables A and B.
Data should be copied from B to A
There is one primary key called id in both tables.
The script should update the matching rows if existing.
The script should insert the missing rows from B if not found in table A
Data is expected to be around 800 k, having 15 columns.
I have no idea what you mean with "including logging", but to insert/update from one table to another, use merge:
merge into a
using b on (b.id = a.id)
when matched then update
set col1 = b.col1,
col2 = b.col2
when not matched then insert (id, col1, col2)
values (b.id, b.col1, col2);
This assumes the PK is named id in both tables.
merge into tableA a
using tableB b
on (a.id = b.id)
when matched then update set
--list columns here
when not matched then insert
--list columns to insert here
;
800k shouldn't be too much to insert in one transaction. If it is too much you should use cursor with bulk collect and split merge in a few steps passing to using only part of data. How big limit set for bulk collect you need to test which gives optimal times.

Using MERGE in SQL Server 2012 to insert/update data

I am using SQL Server 2012 and have two tables with identical structure. I want to insert new records from table 1 to table 2 if they don't already exist in table 2.
If they already exist, I want to update all of the existing records in table 2.
There are some 30 columns in my tables and I want to update all of them.
Can someone please help with this? I had a look at various links posted over internet, but quite don't understand how my statement should look like.
It's really not that hard....
You need:
a source table (or query) to provide data
a target table to merge it into
a condition on which those two tables are checked
a statement what to do if a match (on that condition) is found
a statement what to do if NO match (on that condition) is found
So basically, it's something like:
-- this is your TARGET table - this is where the data goes into
MERGE dbo.SomeTable AS target
-- this is your SOURCE table where the data comes from
USING dbo.AnotherTable AS source
-- this is the CONDITION they have to "meet" on
ON (target.SomeColumn = source.AnotherColumn)
-- if there's a match, so if that row already exists in the target table,
-- then just UPDATE whatever columns in the existing row you want to update
WHEN MATCHED THEN
UPDATE SET Name = source.Name,
OtherCol = source.SomeCol
-- if there's NO match, that is the row in the SOURCE does *NOT* exist in the TARGET yet,
-- then typically INSERT the new row with whichever columns you're interested in
WHEN NOT MATCHED THEN
INSERT (Col1, Col2, ...., ColN)
VALUES (source.Val1, source.Val2, ...., source.ValN);

Merge SQL With Condition

I have a Scenario where i need to User Merge SQL statement to Synchronize two Tables. Let's suppose i have two tables Table A and Table B. Schema is same with the exception of one Extra Column in Table A. That Extra column is a flag that tells me which records are ready to be inserted/updated in Table B. Lets say that flag column is IsReady. It will be either true or False.
Can i use Isready=True in Merge Statement or I need a Temp table to move all records from Table A to Temp Table where IsReady=True and then use Merge SQL on TempTable and Table B???
Yes, you can use that column in the merge condition.
merge tableB targetTable
using tableA sourceTable
on sourceTable.IsReady = 1 and [any other condition]
when not matched then
insert ...
when matched and [...] then
update ...
This may help you,
merge into tableB
using tableA
on tableB.IsReady=true
when not matched then
insert (tableB.field1,tableB.field2..)
values (tableA.field1,tableA.field2..);
commit;

How to fix this stored procedure problem

I have 2 tables. The following are just a stripped down version of these tables.
TableA
Id <pk> incrementing
Name varchar(50)
TableB
TableAId <pk> non incrementing
Name varchar(50)
Now these tables have a relationship to each other.
Scenario
User 1 comes to my site and does some actions(in this case adds rows to Table A). So I use a SqlBulkCopy all this data in Table A.
However I need to add the data also to Table B but I don't know the newly created Id's from Table A as SQLBulkCopy won't return these.
So I am thinking of having a stored procedure that finds all the id's that don't exist in Table B and then insert them in.
INSERT INTO TableB (TableAId , Name)
SELECT Id,Name FROM TableA as tableA
WHERE not exists( ...)
However this comes with a problem. A user at any time can delete something from TableB so if a user deletes say a row and then another user comes around or even the same user comes around and does something to Table A my stored procedure will bring back that deleted row in Table B. Since it will still exist in Table A but not Table B and thus satisfy the stored procedure condition.
So is there a better way of dealing with two tables that need to be updated when using bulk insert?
SQLBulkCopy complicates this so I'd consider using a staging table and an OUTPUT clause
Example, in a mixture of client pseudo code and SQL
create SQLConnection
Create #temptable
Bulkcopy to #temptable
Call proc on same SQLConnection
proc:
INSERT tableA (..)
OUTPUT INSERTED.key, .. INTO TableB
SELECT .. FROM #temptable
close connection
Notes:
temptable will be local to the connection and be isolated
the writes to A and B will be atomic
overlapping or later writes don't care about what happens later to A and B
emphasising the last point, A and B will only ever be populated from the set of rows in #temptable
Alternative:
Add another column to A and B called sessionid and use that to identify row batches.
One option would be to use SQL Servers output clause:
INSERT YourTable (name)
OUTPUT INSERTED.*
VALUES ('NewName')
This will return the id, name of the inserted rows to the client, so you can use them in the insert operation for the second table.
Just as an alternative solution you could use database triggers to update the second table.

"Merging" two tables in T-SQL - replacing or preserving duplicate IDs

I have a web application that uses a fairly large table (millions of rows, about 30 columns). Let's call that TableA. Among the 30 columns, this table has a primary key named "id", and another column named "campaignID".
As part of the application, users are able to upload new sets of data pertaining to new "campaigns".
These data sets have the same structure as TableA, but typically only about 10,000-20,000 rows.
Every row in a new data set will have a unique "id", but they'll all share the same campaignID. In other words, the user is loading the complete data for a new "campaign", so all 10,000 rows have the same "campaignID".
Usually, users are uploading data for a NEW campaign, so there are no rows in TableA with the same campaignID. Since the "id" is unique to each campaign, the id of every row of new data will be unique in TableA.
However, in the rare case where a user tries to load a new set of rows for a "campaign" that's already in the database, the requirement was to remove all the old rows for that campaign from TableA first, and then insert the new rows from the new data set.
So, my stored procedure was simple:
BULK INSERT the new data into a temporary table (#tableB)
Delete any existing rows in TableA with the same campaignID
INSERT INTO Table A ([columns]) SELECT [columns] from #TableB
Drop #TableB
This worked just fine.
But the new requirement is to give users 3 options when they upload new data for handling "duplicates" - instances where the user is uploading data for a campaign that's already in TableA.
Remove ALL data in TableA with the same campaignID, then insert all the new data from #TableB. (This is the old behavior. With this option, they'll never be duplicates.)
If a row in #TableB has the same id as a row in TableA, then update that row in TableA with the row from #TableB (Effectively, this is "replacing" the old data with the new data)
If a row in #TableB has the same id as a row in TableA, then ignore that row in #TableB (Essentially, this is preserving the original data, and ignoring the new data).
A user doesn't get to choose this on a row-by-row basis. She chooses how the data will be merged, and this logic is applied to the entire data set.
In a similar application I worked on that used MySQL, I used the "LOAD DATA INFILE" function, with the "REPLACE" or "IGNORE" option. But I don't know how to do this with SQL Server/T-SQL.
Any solution needs to be efficient enough to handle the fact that TableA has millions of rows, and #TableB (the new data set) may have 10k-20k rows.
I googled for something like a "Merge" command (something that seems to be supported for SQL Server 2008), but I only have access to SQL Server 2005.
In rough pseudocode, I need something like this:
If user selects option 1:
[I'm all set here - I have this working]
If user selects option 2 (replace):
merge into TableA as Target
using #TableB as Source
on TableA.id=#TableB.id
when matched then
update row in TableA with row from #TableB
when not matched then
insert row from #TableB into TableA
If user selects option 3 (preserve):
merge into TableA as Target
using #TableB as Source
on TableA.id=#TableB.id
when matched then
do nothing
when not matched then
insert row from #TableB into TableA
How about this?
option 2:
begin tran;
delete from tablea where exists (select 1 from tableb where tablea.id=tableb.id);
insert into tablea select * from tableb;
commit tran;
option 3:
begin tran;
delete from tableb where exists (select 1 from tablea where tablea.id=tableb.id);
insert into tablea select * from tableb;
commit tran;
As for performance, so long as the id field(s) in tablea (the big table) are indexed, you should be fine.
Why are you using Upserts when he claims he wanted a MERGE? MAREG in SQL 2008 is faster and more efficient.
I would let the merge handle the differences.