I'm trying to create a SSIS package which will copy data from Table-A on Server-A into Table-B on Server-B. And to avoid duplicates, I want to update the data of the records which already exist in Table-B if there are any changes to the data. Please let me know what would be the best approach for this.
Thank You
You should use the SSIS Sort Transformation to remove duplicate records
Drag Sort Transformation and Connect Flat File Source to it. Double-Click on Sort Transformation and Choose the columns to sort. Also Check the Checkbox : Remove rows with duplicate sort values and then click OK
The SSIS Sort Transformation task is useful when you need to sort data into a certain sort order.
Create regular data flow with 2 components - OLE DB Source and OLE DB Destination (I assume you are using MS SQL Server, in general, use whatever components your company uses to connect to the DB).
In case of 2 DBs, create 2 connection managers, each pointing to its DB. Point OLE DB Source to first connection manager configured to point to source of data, and OLE DB Destination to second connection manager configured to point to destination DB.
Now point OLE DB Source to the source table in source DB, leave all the fields intact. Connect source and destination components with green arrow originally going out of source component. Now point OLE DB Destination to the destination table in target DB. Double-click destination, go to mappings and make sure they are correct (SSIS tries to map automatically using strick name matching), otherwise (in case names are different) connect source and destination fields manually. That's it, you just don't provide mappings for the fields which cannot be accommodated by destination table.
Alternatively, you can leave out the columns you don't need at source component - double-click it, go to Columns and uncheck columns you don't need.
Related
I have a data source which is something like
select
patient_id
from patient_table
the destination is a CSV file.
Now I want to add patient_name to both the source and the destination.
I go to the source and I change the query to
select
patient_id,
patient_name
from patient_table
After I add this when I click on columns the patient_name column is not there.
The same thing happens for my destination. I have a flat file destination with the patient_id column so I add the patient_name column to the actual .csv file and that column is not reflected on the flat file connection manager.
The only way that I've been able to get these new columns to show up is to delete the data flow task, connection managers, sources and destinations and to create everything new from scratch.
Is there any other way to do this?
I just created a simple data flow with an OLE DB source and a flat file destination. After adding a second column to the OLE DB source I double clicked my flat file destination which opened the Flat File Destination Editor. Clicking UPDATE added the 2nd column to the flat file connection.
Are you using the latest tooling available to modify SSIS packages?
I don't have an SSDT installation handy right now, so I'll do the best I can without screenshots (and working from memory).
In the Source object, after you add the column to the text of your query, click on Columns, which you already know. Your new column doesn't show up in the list at the bottom yet, which you also know. Up in the top of that window, there's a grid representation of the result set from the query. Find your new column in that grid and check the box to tell the connector you want that column to enter the data flow.
Now go to the connection manager for the .csv file. Add the new column there.
Once it's in the connection manager, now you should be able to map it in the destination object.
There's a possibility that you'll have to click on the arrow or arrows in your data flow task and map the new column in those, too, but it doesn't always happen like that. I haven't taken the time to figure out why that's necessary sometimes and not others, but you'll know right away because the arrows will have red Xs on them.
And that should get you there.
I'm importing a big flat file (about 400.000 rows and 255+ columns) into SQL Server Management Studio through the import wizard.
To get the right variables I use Suggest types, but I have found that I need to search through all the rows to get the right variable types. It takes a very long time. Is there a way to avoid this or do it faster?
Furthermore, my real goal is to transfer data from one sql server database to another on another computer. I do this by exporting it as a flat file. But maybe this is stupid since I lose the information about the correct format?
Thanks!
According to Copy one database to another database:
There are several ways to do this, below are two options:
Option 1
Right click on the database you want to copy
Choose 'Tasks' > 'Generate scripts'
'Select specific database objects'
Check 'Tables'
Mark 'Save to new query window'
Click 'Advanced'
Set 'Types of data to script' to 'Schema and data'
Next, Next
You can now run the generated query on the new database.
Option 2
Right click on the database you want to copy
'Tasks' > 'Export Data'
Next, Next
Choose the database to copy the tables to
Mark 'Copy data from one or more tables or views'
Choose the tables you want to copy
Finish
Backup your database and restores it on the other server (the tager server must be equal or higher version) or simply copy the database files to other server and attached it (when copying database files, you must ensures that either you have detached the database or stop the sql server service).
I am just trying to find out whether this is the right way to do this task.
Any other suggestions to improve this is greatly appreciated.
I have the following on my SSIS package.
Data Flow task and established a OLE DB connection to the source database where the view is.
Execute SQL task - I am executing a query with a INSERT INTO Destination Except (all those records that are already there from the source.)
Send mail task is to send out an email.
How to know that the data transfer is successful? So that I can use the send mail to
indicate success or failure.
How to schedule this package so that it runs automatically (Every Tuesday.)
I have tried the suggestion below. Please refer to the new Data Flow task.
OLE DB Source - Points to a view in database server 1
Lookup gets all the rows from OLE DB source. (the rowcount on source and on the lookup)
matches.
On the lookup task, I have configured error output to use 'Redirect row' on all the mapped columns.
On the OLE DB Destination (Destination table where it already has a subset of records from the source. So the Configured Error output to get unmatches rows for insert.
When, I execute the package - I am getting an Primary key constraint error as - Cannot insert duplicate key.
Any suggestions?
You will want to double click the connector from the Execute SQL Task to Send Mail Task Currently it's green which indicates it will only take that path on Success. You will want to update the constraint to be on Completion as you don't care if it's Success or Fail.
It sounds like you have your data flow pulling all of the data from your source and writing to a staging table. In your Execute SQL Task, you then use a query to add data into your target table where it doesn't exist.
This can be consolidated into a single Data Flow. Between your OLE DB Source and OLE DB Destination, add a Lookup task. Since you are on 2005, the Lookup behaves a bit differently than 2008+. You will write a query that pulls back the business keys in your target table and then compares that to what is coming from your OLE DB Source. Map those keys in the interface.
You only want the rows that aren't matched so you will need to get the "unmatched records" from the lookup. In 2005, the option for Unmatched output didn't exist so you will need to route the Error output to your OLE DB Destination.
Andy Leonard has a nice little writeup on how to accomplish this: Configuration an SSIS 2005 Lookup Transformation for a Left Outer Join The only difference for your case, is that you don't care about the matched rows. Instead of Ignore Failure, you want to select Redirect Row. Then when you go to connect the Lookup to the OLE DB Destination, you will be presented with two options. The Green Connector is the Matched, Red Connector is the Unmatched rows. Tie the Red path to your Destination
Requirement is to Move Data Older than 3 years from Production DB to Archive DB , and Once Moved Delete those records from Production DB , so at any point of time Production DB will have only last three years of records.
i want to achieve this by SSIS , i read quite a few articles about Data Archival but couldn't figure out the best Approch.
I am New to SSIS
i want to achieve exactly something like this (answer given in Below link)with extra condition saying move only those records which are older than 3 years and then delete those records.
https://dba.stackexchange.com/questions/25867/moving-data-from-one-db-to-another-using-ssis
Criteria for an accepted answer answer should address
scalability
complexity
failure handling
reliability
You can use the OUTPUT clause to delete and return the data to be moved in one go.
create table ProductionTable
(
ValueDate datetime not null
, Data varchar(max) not null
)
insert ProductionTable values ('20100101', '3 years ago')
insert ProductionTable values ('20130425', 'this year')
insert ProductionTable values ('20130426', 'this year')
delete ProductionTable
output deleted.ValueDate, deleted.Data
where ValueDate <= dateadd(year, -3, getdate())
The code can also be accessed on SQLFiddle
Now I will show you the exact steps you need to follow in SSIS to reproduce the example:
Create a new project and define your data sources for ProductionDB and ArchiveDB.
In "Control Flow" tab, create a "Data Flow Task".
In "Data Flow" tab, create a "OLE DB Source" and a "OLE DB Destination".
In "OLE DB Source", select ProductionDB and choose "SQL command" as the data access mode. Paste in the delete statement with the output clause.
Click on "Columns" and then OK.
In "OLE DB Destination", select ArchiveDB and choose "Table or view - fast load" as the data access mode and then choose your ArchiveTable.
Click on "Mappings" and then Ok.
Run the package and you should be able to verify that one row is deleted from ProductionTable and moved to ArchiveTable.
Hope it helps.
Other things to keep in mind, because you are deleting and moving data around, transactional consistency is very important. Imagine half way through your delete/move, the server went down, you then end up with data being deleted but not made it to the archive.
If you are unsure about how to protect your data by enforcing transactional consistency, please seek help from other SQL/SSIS experts on how to use transactions in SSIS.
Create 2 OLE DB Connection Managers. Name them Production and Archive and have them point to the correct servers and database. These CMs are what SSIS uses to push and pull data from the databases.
Add a Data Flow Task. A DFT is the executable that will allow row by row manipulation of the data. Double click on the Data Flow Task. Once inside, add an OLE DB Source and and OLE DB Destination to the canvas. The OLE DB Source is where the data will come from while the OLE DB Destination provides the insert power.
The logic you would want to implement is a Delete first approach, much as I outlined in the other answer.
DELETE
DF
OUTPUT
DELETED.*
FROM
dbo.DeleteFirst AS DF
WHERE
DF.RecordDate > dateadd(y, 3, current_timestamp);
This query will delete all the rows older than 3 years and push them into the dataflow. In your OLE DB Source, make the following configuration changes
change the Connection Manager from Archive to Production
change the query type from "Table or View" to "Query"
paste your query and click the Columns tab to double check the query parsed
Connect the OLE DB Source to the OLE DB Destination. Double click on the OLE DB Destination and configure it
Verify the Connection Manager is the Archive
Ensure the Access Mode is "Table or View - Fastload" (name approximate)
You might need to check the Retain IDs based on your table design - if you have identity column, then do check it if you want ID 10 from the production system to be ID 10 in the Archive system
Select the actual table
On the Mapping tab, ensure that all the columns mapped. It does this automatically by matching names so there shouldn't be a problem.
If you do not need to span an instance, the above logic can be condensed into a single Execute SQL Task
DELETE
DF
OUTPUT
DELETED.*
INTO
ArchiveDatabase.dbo.DeleteFirst
FROM
dbo.DeleteFirst AS DF
WHERE
DF.RecordDate > dateadd(y, 3, current_timestamp);
Also note with this approach that if you have identity columns you will need to provide an explicit column list and turn on and off the IDENTITY_INSERT property.
Have you considered table partitioning instead? You can move the old records to a totally different disk and still leave them available in the same table. It can also help with performance in some cases... all without an SSIS package.
I am currently using a database that is poorly designed and a slow pipeline so i decided to copy a small portion of the database (15 tables) and only bring over some of those tables for example i want to bring only the rows that have a certain id.
But this is not a one time move i need to add all the stuff that is added to the old database added to the new one on an hourly basis. My research has led me to SSIS and that it may have a way of accomplishing this but i have found no clear examples on how it is done if in fact it is possible. Thanks in advance.
Yes it is possible . You can schedule your ssis package through sql agent to run on hourly basis .
For a table ,you can drag a Data Flow Task on to the control flow .Inside DFT ,you need to place an oledb Source component ,Lookup ,Data conversion (if the types are different in source and target table) and Oledb destination .
oledb Source component : Create a variable of type string and in the expression write your sql query to fetch the data based on ID.Now use this variable in source component.
Lookup: You need to select your source table and combine the primary key column from the source and destination table.It acts similar to inner join query .After combining the primary key from the both the tables ,select the columns which you need from the source .
Oledb destination : Simply select your target table and map the columns from Lookup no matched output .If you need to update the values from the source then use Lookup matched output and connect it to an execute SQL task and write the update query .
Please go thru the link and SO
Scheduling of SSIS package