I have created an SSIS package (see below) to import data from an external SQL query into a SharePoint 2007 list. The data imports fine but when the package is ran again to update the data it duplicates the records. I'm guessing that as there is no link between the SharePoint ID of the imported records and the data from my SQL query the routine has no idea what to update and just creates a new record. How do I prevent this and allow my data to be updated in the SharePoint list?
If you are setting the key Id field in your SharePoint list target it will perform an update, otherwise the default is an insert. It sounds like you have not mapped the Id
You can either
Set (map) the ID column thus forcing the SharePoint destination component to perform an update. Have a look at this example by Chris Kent
Limit your source select statement based on the last inserted record inside the SharePoint list. Prior to the data flow task, you would need to select the max(date or key?) from SharePoint and set an expression for your data source to include this value in the WHERE clause resulting in selecting only new records. This has the added benefit of limiting the amount of data traveling across the network and your existing insert setup would work.
Related
I'm working on an SSIS project that pulls data form Excel and loads to Oracle Database every month. I plan to pull data from Excel file and load to Oracle stage table. I will be using a merge statement because the data that gets loaded each month is a rolling 12 month list and the data can change, so need to be able to INSERT when records don't match or UPDATE when they do. My control flow looks like this: Truncate Stage Table (to clear out table from last package run)---> DATA FLOW from Excel to Stage Table---> Merge to Target Table in Oracle.
My problem is that the data in the source Excel file doesn't have any unique columns to select a primary key or a composite key, as it is a possibility (although very unlikely) that a new record could have the exact same information. I am unable to utilize the "generated always as identity" because my SSIS package needs to truncate at the beginning of each job to clear out the Stage Table. This would generate the same ID numbers in the new load and create problems in the Target Table.
Any suggestions as to how I can get around this problem?
Welcome to SO and ETL. Instead of using a staging table, in SSIS use two sources: Excel file and existing production table. Sort both inputs and then perform a merge join on the unique identifier. From there, use a derived column transformation to add a new column called 'Action' which will mark a row as either an INSERT/UPDATE/DELETE based on whether the join key is NULL. So:
NULL from file means DELETE (not in file, in database)
NULL from database means INSERT (in file, not in database)
Not NULL for both means UPDATE (in file, in database)
From there, use a conditional split to split rows to either a OLE DB Destination (INSERT), or SQL Command (UPDATE or DELETE). You can now remove the stage environment and MERGE command from your process. This has the added benefit of removing the ETL load from the SQL Server, assuming SSIS is running on a separate server.
Note: The sort transformation has the option to remove duplicates.
Please Consider this scenario:
I have a table in my database. I want move this data in my OLAP database using SSIS.I can move all record from my table to OLAP database.The problem is I don't know how I can apply changes in OLAP environment.For example if just 100 record of my table were changed how I can apply this changes NOT copy all records from scratch.
How I can Merge this two tables?
thanks
There are two main approaches to this:
Lookup Transformation --> OLE DB Command / OLE DB Destination
Load all data to a staging table and perform the MERGE using SQL.
My Preference is for the latter because the update is SET Based, but I do use the former where I know it will be predominantly inserts.
With the former you will end up with a data flow task something like:
This is a OLE DB Source from the OLTP database, which then looks up against your OLAP Database to retrieve the surrogate key. Where there is no match it simple inserts a new record to the OLE DB Destination, when there is a match it does a conditional split, if any fields have changed it will use the OLE DB Command to update the OLAP table.
It can obviously get much more complicated than this, but this covers the simplest example.
You can also use the Slowly Changing Dimension Transformation to open up a wizard to create your data flow for you, which again gets a bit more complex:
As mentioned though, my Preference is for a staging table and a set based update, because the OLE DB Command executes on a row by row basis, so if you are updating millions of records this will take a long time. You can simply create a staging table on your OLAP database and move the data in with a simple OLE DB Source and Destination, then use MERGE to update the OLAP Table:
MERGE OLAP o
USING Staging s
ON o.BusinessKey = s.BusinessKey
AND o.Type2SCD = s.Type2SCD
AND o.Active = 1
WHEN MATCHED AND o.Type1SCD != s.Type1SCD THEN
UPDATE
SET Type1SCD = s.Type1SCD
WHEN NOT MATCHED BY TARGET THEN
INSERT (BusinessKey, Type1SCD, Type2SCD, Active, EffectiveDate)
VALUES (s.BusinessKey, s.Type1SCD, s.Type2SCD, 1, GETDATE())
WHEN NOT MATCHED BY SOURCE AND o.Active = 1 THEN
UPDATE
SET Active = 0;
The above assumes you have one active record per business Key, and both type 1 and type 2 slowly changing dimentions, it will insert a new record where there is no match on BusinessKey and Type2SCD, in addition it will set any unmatched records in the source table to inactive. When there is a match but the type 1 SCD is different this will be updated.
It is worth noting that MERGE has it's downsides, and you may want to write your set based upserts as separate INSERT and UPDATE statements. One major issue I have come across is that on all my Dimension tables I have a unique filtered index on my BusinessKey field WHERE Active = 1 to ensure there is only one active record, which the MERGE I have written should work fine for, but doesn't as detailed in this connect item. Although it was not the end of the world having to add OPTION (QUERYTRACEON 8790); to the end of all the MERGE statements in my ETL it was not ideal.
Sounds like you're wanting to use incremental loads.
The first five tutorials on this page should point you in the right direction - I found them really useful in the past.
I have a table that is a replicate of a table from a different server.
Unfortunately I don't have access to the transaction information, and all I have is the table that shows "as is" information & I have a SSIS to replicate the table on my server every day (the table gets truncated, and new information is pulled every night).
Everything has been fine and good, but I want to start tracking what has changed. i.e. I want to know if a new row has been inserted or a value of a column has changed.
Is this something that could be done easily?
I would appreciate any help..
The SQL version is SQL Server 2012 SP1 | Enterprise
If you want to do this for a perticular table then you can go for a scd(slowly changing dimension) transform in SSIS control flow which will keep the hystory records in different table
or
you can create CDC(changing data capture) method on that table.CDC will help you on monitering of every DML operation in that table.It will inserted in the modified row in the system table.
I have one excel file that I want to import into two different tables, tblUni and tblUser.
I have a third table which contains the id's from the other two tables:
tblUni_Students
Id
UniId
StudentId
What I need is when I import the excel data into the first two tables, for each record, the newly created ids to be inserted into the Uni_Students table also.
Using SSIS, I have managed to import the data into two sql destinations but cannot seem to then take the new ids from these destinations to then insert into the lookup table.
Can anyone advise please. Thanks.
It's a bit difficult to answer without knowing the target database or the structure of the data but speaking generally this would be much better done by adding the data into a "load" table. i.e. one who's sole reason is to temporarily hold data while you process it, you would then update the tblStudent, tblUni and tblUni_Student tables from the load area using SQL statements either via Procedure or via an Execute SQL Task component.
You'd it as an oledbcommand component, where the command is to insert values into the table. Then in the same component you'd output the generated identity. Assign the generated identity to a new column in the output, and now you have all your data plus the generated identity in the dataflow.
This will be processed one row at a time, so it will be slow. Personally I'd put it in a staging table and do it as CiarĂ¡n described.
I am currently using a database that is poorly designed and a slow pipeline so i decided to copy a small portion of the database (15 tables) and only bring over some of those tables for example i want to bring only the rows that have a certain id.
But this is not a one time move i need to add all the stuff that is added to the old database added to the new one on an hourly basis. My research has led me to SSIS and that it may have a way of accomplishing this but i have found no clear examples on how it is done if in fact it is possible. Thanks in advance.
Yes it is possible . You can schedule your ssis package through sql agent to run on hourly basis .
For a table ,you can drag a Data Flow Task on to the control flow .Inside DFT ,you need to place an oledb Source component ,Lookup ,Data conversion (if the types are different in source and target table) and Oledb destination .
oledb Source component : Create a variable of type string and in the expression write your sql query to fetch the data based on ID.Now use this variable in source component.
Lookup: You need to select your source table and combine the primary key column from the source and destination table.It acts similar to inner join query .After combining the primary key from the both the tables ,select the columns which you need from the source .
Oledb destination : Simply select your target table and map the columns from Lookup no matched output .If you need to update the values from the source then use Lookup matched output and connect it to an execute SQL task and write the update query .
Please go thru the link and SO
Scheduling of SSIS package