Verify multiple tables and copy data in ssis/BIML? - sql

I have a package that have about 6 to 7 dataflow tasks.Within those dataflow tasks, I have up from 5 to 70 tasks thaht copy data from a source(ORACLE database) to a destination(sql database). I need to make to make a count of the source table and then if the source is not empty I will copy the data .I have presently a execute sql task taht trucate all the tables.I would like to truncate if my parameters is > 0 .But wuth my use number of tables(177), I can't afford to use a variable for each one to hold the result of the count and then test the rest.Can I make something work with BIML.Can I use a stored procedure and loop throug it. I need some advice.
EDIT: ////
I think i did not explain myself correctly. I have multiple dataflow task with a lot of source to destination copy.In my control flow , I have an execute sql task that truncate all my 177 tables. I need to do a count on all the sources tables and store the results so i can send it to my execute sqltask.After thaht i want to check if my variable is > 0 then I would not do the task.Is there any easier way to do this than to create 177 variables.
Thanks.

I hope i'm not too late for you. You can use bimlonline.com to reverse engineer your package.
Bimlonline.com is free

Related

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

Do while loop with GPDB using talend

I have a very large data set in GPDB from which I need to extract close to 3.5 million records. I use this for a flatfile which is then used to load to different tables. I use Talend, and do a select * from table using the tgreenpluminput component and feed that to a tfileoutputdelimited. However due to the very large volume of the file, I run out of memory while executing it on the Talend server.
I lack the permissions of a super user and unable to do a \copy to output it to a csv file. I think something like a do while or a tloop with more limited number of rows might work for me. But my table doesnt have any row_id or uid to distinguish the rows.
Please help me with suggestions how to solve this. Appreciate any ideas. Thanks!
If your requirement is to load data into different tables from one table, then you do not need to go for load into file and then from file to table.
There is a component named tGreenplumRow which allows you to write direct sql queries (DDL and DML queries) in it.
Below is a sample job,
If you notice, there are three insert statements inside this component. It will be executed one by one separated by semicolon.

Execute SQL task (SSIS) and then insert the result set into a table on a different server

This is more of a generic question :
I have file1.sql, file2.sql , file3.sql in a folder. I can run a foreach container to loop through the files and execute it but I need the result set to go to respective tables sitting on a different server
file1 result set --> Server2.TableA
file2 result set --> Server2.TableB .. etc
How can this be achieved through SSIS techniques ?
You can do this with a script task in the foreach loop, that analyses the result set and inserts it to the appropriate destination table.
You could also put all the records into a staging table on one server with additional columns for that server they will go to and a isprocesssed bit field.
At this point you could do any clean up required of the data.
Then create a separate dataflow for each server to grab the unprocessed records for that server. After they are sent, then mark the records as processed.
This will work if you only have a few servers. If there are many possibilities or you expect the number will continue to change, I would go with #TabAlleman's suggestion.
thestralFeather,
If you are new to SSIS, refer to msdn's tutorial on looping utilizing SSIS here. If you look at this page within the tutorial, you will see in the dataflow the output destination. #Tab Allerman and #HLGEM have provided good advice. When you look at the pages I've referred you to, just thing in terms of 2 separate loops dropping data to a single location that you can manage in a target dataflow.

Transferring data from Excel file (multiple sheets) to SQL (multiple tables) in a step by step process

Talend
I want to transfer data from Excel to SQL. My Excel worksheet has 20 different sheets having data to be copied in 20 different tables.
I want to create a single job to perform this task in a step by step manner.
My tables have dependencies so it is very important to copy data to these tables in a specific order.
I want to have 20 steps in a single job.
Any help (online source / video) regarding how to go about this task would be appreciated.
I am thinking about using trigger -> OncomponentOk for this task. I will connect 1st output with second input using this. And so on.. I believe that this will work but after transferring the data in to the last table I want to call a stored procedure and I am unsure about how to do it. Let me know if you any idea about this
you have to design job like below to solve this issue.
openDbconnection---oncomponentOk----tFileInputExcel(sheet 1)------tSQlOuput
and add rest of sub jobs after previous using subjobok link.
refer image for more details. \

SSIS storing logging variables in a derived column

I am developing SSIS packages that consist of 2 main steps:
Step 1: Grab all sorts of data from existing legacy systems and dump them into a series of staging tables in my database.
Step 2: Move the data from my staging tables into a more relational set of tables that I'm using specifically for my project.
In step 1 I'm just doing a bulk SELECT and a bulk INSERT; however, in step 2 I'm doing row-by-row inserts into my tables using OLEDB Command tasks so that I can log very specific row-level activity of everything that's happening. Here is my general layout for step 2 processes.
alt text http://dl.dropbox.com/u/2468578/screenshots/step_1.png
You'll notice 3 OLEDB tasks: 1 for the actual INSERT, and 2 for success/fail INSERTs into our logging table.
The main thing I'm logging is source table/id and destination table/id for each row that passes through this flow. I'm storing this stuff in variables and adding them to the data flow using a Derived Column so that I can easily map them to the query parameters of the stored procedures.
alt text http://dl.dropbox.com/u/2468578/screenshots/step_3.png
I've decided to store these logging values in variables instead of hard-coding the values in the SqlCommand field on the task, because I'm pretty sure you CAN'T put variable expressions in that field (i.e. exec storedproc #[User::VariableName],... ,... ,...). So, this is the best solution I've found.
alt text http://dl.dropbox.com/u/2468578/screenshots/step_2.png
Is this the best solution? Probably not.
Is it good performance wise to add 4 logging columns to a data flow that consists of 500,000 records? Probably not.
Can you think of a better way?
I really don't think calling an OLEDBCommand 500,000 times is going to be performant.
If you are already going to staging tables - load it all to a staging table and take it from there in T-SQL or even another dataflow (or to a raw file and then something else depending on your complete operation). A Bulk insert is going to be hugely more efficient.
to add to Cade's answer if you truly need the logging info on a row by row basis, your best best is to leverage the oledb destination and use one or both of the following transformations to add columns to the dataflow:
Derived Column Transformation
Audit Transformation
This should be your best bet and should't add much overhead