Why does my SSIS package takes so long to execute? - sql

I am fairly new creating SSIS packages. I have the following SQL Server 2008 table called BanqueDetailHistoryRef containing 10,922,583 rows.
I want to extract the rows that were inserted on a specific date (or dates) and insert them on a table on another server. I am trying to achieve this through a SSIS which diagram looks like this:
OLEDB Source (the table with the 10Million+ records) --> Lookup --> OLEDB Destination
On the look up I have set:
Now, the query (specified on the Lookup transformation):
SELECT * FROM BanqueDetailHistoryRef WHERE ValueDate = '2014-01-06';
Takes around 1 second to run through SQL Server Management Studio, but the described SSIS package is taking really long time to run (like an hour).
Why is causing this? Is this the right way to achieve my desired results?

You didn't show how your OLEDB Source component was set up but looking at the table names I'd guess you are loading the whole 10 million rows in the OLEDB source and then using the Lookup to filter out only the ones you need. This is needlessly slow.
You can remove the Lookup completely and filter the rows in OLEDB source using the same query you had in the Lookup.

Related

Update/Rewrite only a single row in SQL Server

I want to only have 1 row in my table. I am populating a row from Server1 to Server2 on a single table using SSSIS. I am updating Execution End time that I get from a table in server1 to a table in server2. Here is the query I use to populate :
SELECT TOP 1 EXEC_END_TIME
FROM cr_stat_execution cse
WHERE cse.EXEC_NAME = 'ETL'
ORDER BY exec_end_time DESC
The problem:
I only want to update server2's table with the recent record only or rewrite previous days data. I don't want to have a history on my table, how can I modify my query to only populate the most recent data from Server1 to Server2 without having rows of history.
Your package will have two Connection managers. In this case, I'll assume OLEDB but ADO.NET or ODBC will also work and further assume they point to Server1 and Server2 and are named as such.
The pattern you are going to have is two Execute SQL Tasks in serial. The first Execute SQL Task will ask Server1 what the value of EXEC_END_TIME is and store that to an SSIS variable.
Create a variable named LastExec and set the data type as datetime and initialize it to something like 1900-01-01 at midnight
In the Execute SQL Task, change the result type from None to Single Row and then on the Result Set tab, map the 0th element to the variable
See also How to set a variable using a scalar-valued tSQL function in SSIS SQL Task
The second Execute SQL Task will update statement as Panagiotis describes and the "magic" will be using the SSIS variable in the query.
UPDATE ThatTable SET ThatField=?
The ? is the place holder for OLEDB and ODBC connection manager queries. The difference being OLE is 0 based ordinal and ODBC is 1 based. ADO.NET will used named parameters so the original comment query would work.
In the Parameter Mapping tab, you will need to associate the SSIS variable with ordinal position 0 of your query. Sample screen Logging information when executing remotely SSIS from ASP.NET
Get the first execute sql working first. Once you can query the database and assign to a variable, getting the next one (set as a successor) in line should be a snap.
Data flow approach
You could continue with your data flow approach. Instead of an OLE DB Destination, you'll use an OLE DB Command. Use the same query and this time, you'd map the source column to the zero-eth element.
It's overkill so that's the reason I did not advocate for its approach.

r code production in SQL Server

Above if I generated a SQL query that produced the defined table and an "R" summary dataframe that produced the defined table, how you I be able to link them both together into production.
*Note, I'm not asking how to create a SQL query or an R Dataframe for the defined tables (I have the code for that), but rather how can they both work together. For example, could that R dataframe code be used in SQL Server (I have the latest version) so that as soon as the SQL query created the tables (BY DATE), the summary table would automatically update itself (BY DATE) in SQL Server?
So as soon as it went aggregated the summary of the first date, it would move onto the next and essentially generate (stack on top of each other)
Thank you.

Adding new data to the end of a table using power query

I've got my query that pulls data from sql server 2012 into excel using power query. However, when I change the date range in my query I'd like to pull the new data into my table and store it below the previously pulled data without deleting it. So far all I've found is the refresh button, which will rerun my query with the new dates but replace the old. Is there a way to accomplish this? I'm using it to build an automated QA testing program that will compare this period to last. Thank you.
This sounds like incremental load. If your table doesn't exceed 1,1 Mio rows, you can use the technique described here: http://www.thebiccountant.com/2016/02/09/how-to-create-a-load-history-or-load-log-in-power-query-or-power-bi/

How to query SQL table for each row returned by SSIS Excel Source component

If i have an SSIS Excel source reading data from an Excel workbook, is there any way i can query an SQL table for each row returned?
Example:
Excel Source returns EmployeeID, EmployeeName and Department. Now for each row returned by the Excel source, is it possible to query an SQL Sever table for the EmployeeCategory where EmployeeID in Excel row matches EmployeeID in the EmployeeCategory SQL table?
Such that i end up with a result set in the format
EmployeeID(Excel), EmployeeName(Excel), Department(Excel) , EmployeeCategory(SQL Table)
Certainly.
This is exactly what the SSIS Lookup Transformation does.
Have a look here: https://www.youtube.com/watch?v=WMfuXYsWZqM
I should add that, for performance reasons, you really don't want to perform a new query for each record in your data flow - that would be extremely slow. What the SSIS Lookup Transformation does however, is it creates an in-memory cache of your SQL Server table, and then looks up the value of the EmployeeCategory for each record in-memory, which is blazingly fast.
If you are sure you want to actually perform a query for each record, the Lookup Transformation component has a property that lets it run with no cache. If you want even more more flexibility (possibly at an even greater performance loss), you could use the OLE DB Command Transformation, that let's you specify the query you want to execute, for each record in your data flow.

Using Multiple Sources in SSIS Data Flow Task

For my data flow task I have a OLEDB Source. In the SQL command section of this I have compiled a select query based on tables from two different databases, held on the same instance. Every time I run this it errors, but when I moved the tables to the same database (for testing purposes) it worked.
I'm guessing from this that the source data needs to be from the same database but is there anyway around this? I tried using a look-up but I couldn't get it to work. I could create a view in the source database but I'm guessing there must be a way to keep it all within the package.
Thank you in advance! This is the query I was using in the OLE DB Source:
select *
from commoncomponents.meta.ItemTypeLabelDefinition
where internalid not in
(
select internalid
from iscanimport.dbo.ItemTypeLabelDefinition
)
Not sure why the cross-DB query wouldn't work in the one source, but one method would be to create two OleDb Sources, one pointing to CommonComponents DB doing the select from ItemTypeLabelDefinition, and the other pointing to IScanImport and the select statement from your sub-query. Preferably sort these the same way at source in your queries, then use a Merge Join task to combine them.