Table Variables in SSIS - sql

In one SQL Task can I create a table variable
DELCARE #TableVar TABLE (...)
Then in another SQL Task or DataSource destination and select or insert into the table variable?
The other option I have considered is using a Temp Table.
CREATE TABLE #TempTable (...)
I would prefer to use Table Variable so that it remains in memory. But can use temp table if it is not possible to use table variable. Also I cannot use the record set destination as I need to preform straight SQL tasks on it later on.
The use case that this is trying to solve is essentially performing a transformation in the stead of BizTalk. There is a very large flat file to flat file transformation that BizTalk has to transform unfortunately the data volume would produce unacceptable load on the BizTalk server so the idea is to off load it to SSIS. However, it is not a simple row to row transformation, there are different types of rows which have relations to each other. The first task in SSIS is to load the row into appropriate (temp) tables, then in the second data task a select is preformed with the correct format for output.

You could use some of the techniques in this post: http://consultingblogs.emc.com/jamiethomson/archive/2006/11/19/SSIS_3A00_-Using-temporary-tables.aspx
especially the ones about using RetainSameConnection=TRUE on the connection manager.
I would be interested to see more information about what use case you have that requires you to write out data to a temp table or table variable before further SSIS processing. Couldn't you take care of all of the SQL required steps in your source query before you start processing the dataflow with SSIS?

Table variables are not kept solely in memory and can be written to disk under memory pressure. I tend to use table variables for very small lookups. If you Need to push a table into SQL Server due to necessary and complex transformations, then use a 'permanent' temp table that is truncated within the SSIS package prior to insert. Simple and will get what you need done.

The SSIS package would be run in a job. I assume it runs inside a SQL job. In that case, using a temp table won't harm. SQL Jobs are generally run after office hours so it does not matter.

Related

How to set up a staging table in SQL with SSIS dataflow?

I am trying to create a dataflow in SSIS where the source data originates from an excel file and reaches to a temporary staging table in a SQL server where I can add various stored procedures to the data.
The dataflow that I have created stores the data permanently on what is supposed to be the staging area.
I would like to get some ideas on creating the staging table in SQL with the SSIS dataflow.
your question is a bit confusing. I suppose that you are maybe trying to make the data loaded in the table of the staging area temporary without keeping the past loaded data.
If I'm right what you're trying to accomplish is a "full resfresh" data flow.
From your description I assume you alerady have the staging table (so no nedd to CREATE it) but you need to truncate it at every run. You can achive this by using a Execut SQL Task element to the control flow with a TRUNCATE TABLE <YOUR TABLE NAME> in it. The data flow loading the data must be in dependency of this task with the result of truncating your table at every run.
If you need to CREATE a table you can do it in the control flow with the Execute SQL Task (you can execute any kind of query with this task), rember to set correctly the connection manager of the task.

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

SSIS Alternatives to one-by-one update from RecordSet

I'm looking for a way to speed up the following process: I have a SSIS package that loads data from Excel files on a weekly basis to SQL Server. There are 3 fields: Brand, Date, Value.
In the dataflow, I check for existing combinations of Brand+Date, and new combinations go to the table directly, the existing ones go to a RecordSet destination for updates:
The next step is to update the Value of the existing combinations:
As you can see, there are thousands of records to update, and it takes too long. The number of records tend to grow week by week. Please suggest.
The fastest way will be do this inside a Stored procedure using ELT (Extract Load Transform) approach.
Push all data from excel as is into a table(called load to a staging table in theory). Since you do not seem to be concerned with data validation steps, this table can be a replica of final destination table columns.
Next step is to call a stored procedure using Execute SQL task. Inside this procedure you can put all your business logic. Since this steps with native data manipulation on SQL server entities, it is the fastest alternative.
As a last part, please delete all entries from the staging table.
You can use indexes on staging table to make the SP part even faster.

SSIS delete duplicate data

I have a problem for making a SSIS package.Can anyone help?
Here is the case:
I have two tables: A & B, and the structure is the same. But they are stored on different servers.
I have already made an SSIS package to transfer data from A to B (about one million rows at a time, which takes between one and two minutes).
After that I want to delete table A's data, after having been transfered to B. The SSIS package I wrote would follow that. I use a merge join and Conditional Split command to select the same data.
After that I use the OLE DB Command to delete Table A's data (just use "Delete RE_FormTo Where ID=?" SQLCommand to delete). It can work, but it is too slow! It took about one hour to delete the duplicate data! Does anyone know of a more efficient way of doing this?
SSIS Package Link
The execution is bound to be slow because of the poor SSIS package design .
Kindly refer the document Best Practices of SSIS Design
Let me explain you the mistakes which are there in your package .
1.You are using a Blocking transformation (Sort Component) .These transformations doesn't reuse the input buffer but create a new buffer for output and mostly they are slower than Synchronous components such as Lookup ,Derived Column etc which try to re use the input buffer .
As per MSDN
Do not sort within Integration Services unless it is absolutely necessary. In
order to perform a sort, Integration Services allocates the memory space of the
entire data set that needs to be transformed. If possible, presort the data before
it goes into the pipeline. If you must sort data, try your best to sort only small
data sets in the pipeline. Instead of using Integration Services for sorting, use
an SQL statement with ORDER BY to sort large data sets in the database – mark
the output as sorted by changing the Integration Services pipeline metadata
on the data
source.
2.Merge Join is a semi-blocking transformation which does hamper the performance but much less than Blocking transformation
There are 2 ways in which you can solve the issue
Use Lookup
Use Execute SQL Task and write the Merge SQL
DECLARE #T TABLE(ID INT);
Merge #TableA as target
using #TableB as source
on target.ID=source.ID
when matched then
Delete OUTPUT source.ID INTO #T;
DELETE #TableA
WHERE ID in (SELECT ID
FROM #T);
After you Joining both tables just insert Sort element, he gonna remove duplicates....
http://sqlblog.com/blogs/jamie_thomson/archive/2009/11/12/sort-transform-arbitration-ssis.aspx

How to do Data Flow Task from/to the same table?

I am using SQL Server 2005 SSIS and we are using the Data Flow Task to move data from one table to another. This works well. Now we have another requirement to do data update from the same table using this approach.
Is this possible to use the same approach for as follow:
We have a dataset from Table A based on complex query
We update back to the Table A
The normal query UPDATE INTO is not an option due it takes awhile to process and we can't see the data movement like we did for Data Flow Task.
Any guidance or anything that will be good.
Thanks
either:
write it to a temporay table and do the update into with a single SQL task after you processed everything
break it down into smaller chunks based on SSIS variables and OFFSET and use a FOR/FOREACH LOOP
Read the data with a data source in a data flow task, and use ole db command in the data flow to update the data in the same table. If there is no locking when you read and only row-level locking when you update, that should work