SSIS - Passing variable as column name in where clause of an ado.net source SQL command expression - variables

I'm trying to build a template SSIS package that will pull from our source system and populate our load tables. The package works as follows, since we're taking from Oracle and moving to SQL Server.
Variable of "SourceDelta", String, value is blank
Execute SQL Task that is a select statement against a SQL table holding the delta column name, LastUpdateDT, the result set is mapped to the variable SourceDelta.
I then have a data flow task, where the source is an ado.net source. I have an expression on the data flow task which is:
select *
from VariableSourceLocation
where
SourceDelta >= ##
and SourceDelta < ##
Fine, you'd think, it'd populate the SourceDelta colum name into the expression. However, when opening the data flow task and opening the ado.net source, the sql command text reads as follows:
select *
from VariableSourceLocation (sets correctly showing the value)
where
BLANK>= ##
and BLANK< ##
It just passes a blank, as the variable has no value until the previous step. It's as though it is not inputting the variable prior to the package validation, so validation errors as it can't run the where clause of the query.
Banging my head against the wall with this, any help appreciated. Fully aware of injection issues surrounding this but it's purely for me to create SSIS data flow packages utilising a single sql table only I will have access to for variables. It'll speed up the onerous task a lot.

Related

ODBC source doesn't receive dynamic variable value

I am designing an incremental load for my ETL solution in SSIS. For that, I have an Execute SQL task that gets the maximum load time from the data warehouse and then stores it in a package variable, which is already set to evaluate as expression.
Then I have set the ODBC source's sql command property to an expression that has my query and the variable. However, I have looked into the variable during debug and when I run the package it seems that this variable doesn't get used in the sql command, instead it remains null.
I have already tried setting the variable property 'EvaluateAsExpression' to True, I have tried storing the query in a different variable and then setting that as the sql command for my ODBC source.

SSIS Using For Loop Container Not Executing Script Task and Data Flow Tasks

I'm reaching out to the experts as I have hit a wall with a recent project. I have created an SSIS package (2008R2) that uses a script task to build a SQL statement, where a variable(#month1) is being used within the SQL statement, to specify a month look back in a membership table. I want to also use the #month1 variable as a "counter" for the loop container to specify how many times to execute the query. The SQL query is attached to a data flow task to append these records into a table on a SQL server database. The script task and data flow task work outside of the for loop container with the initial value given for the #month1 variable but I cannot figure out how to make the for loop container update the #month1 "counter" variable so that the for each loop can use it as a "counter" and the SQL statement can use it as a condition with in the created SQL statement. Any one have any ideas or examples on how to do this?
** Update **
The For Loop container is the issue. The script task and data flow task work outside of the For Loop container. It will use the initial variable setting for #month1 and create the dynamic sql script, execute script and transfer data from source database server to the destination source server. The issue is when I place these steps within the For Loop container, the container executes and turns green but does not invoke the steps within it. This is why I'm thinking the container is not reading the variable #month1, even though the variable is set at the package level. Any thoughts?
First of all, try to set the data flow Delay Validation property to True. If it still not working, instead of passing the variable as parameter in the OLEDB Source use expressions:
Create a variable of type string.
Change its EvaluateAsExpression property to True
Set the expression similar to:
"select * from table
Where column =" + (dt_str, 50)#[User::month1]
In the OLEDB Source select the Access mode as SQL Command from variable and select this variable.
Be aware that the month1 variable is not created twice wuth different scopes, click on the data flow task and check the variable panel if it shows additional variables.
I appreciate everyone's responses but it seems I tricked myself on this one. In looking for the most complicated issue I overlooked the most simple and obvious one. The reason my For Loop was not executing the steps inside of the container was because I had the initial value for #month1 set to 3 (intentionally) and wanted to loop until it was resolved to -49. In the EvalExpression setting, it will evaluate until the statement is FALSE...so the evaluation I had in there of #month1 <= -49 was already false. It needed to be #month1 > -49 so as soon as it fell to -49 the statement would be false. I do this to myself more than I should admit, can't see the forest for the trees!

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.

Using dynamic SQL in an OLE DB source in SSIS 2012

I have a stored proc as the SQL command text, which is getting passed a parameter that contains a table name. The proc then returns data from that table. I cannot call the table directly as the OLE DB source because some business logic needs to happen to the result set in the proc. In SQL 2008 this worked fine. In an upgraded 2012 package I get "The metadata could not be determined because ... contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set."
The problem is I cannot define the field names in the proc because the table name that gets passed as a parameter can be a different value and the resulting fields can be different every time. Anybody encounter this problem or have any ideas? I've tried all sorts of things with dynamic SQL using "dm_exec_describe_first_result_set", temp tables and CTEs that contains WITH RESULT SETS, but it doesn't work in SSIS 2012, same error. Context is a problem with a lot of the dynamic SQL approaches.
This is latest thing I tried, with no luck:
DECLARE #sql VARCHAR(MAX)
SET #sql = 'SELECT * FROM ' + #dataTableName
DECLARE #listStr VARCHAR(MAX)
SELECT #listStr = COALESCE(#listStr +',','') + [name] + ' ' + system_type_name FROM sys.dm_exec_describe_first_result_set(#sql, NULL, 1)
exec('exec(''SELECT * FROM myDataTable'') WITH RESULT SETS ((' + #listStr + '))')
So I ask out of kindness, by why on God's green earth are you using an SSIS Data Flow task to handle dynamic source data like this?
The reason you're running into trouble is because you're perverting every purpose of an SSIS Data flow task:
to extract a known source with known metadata that can be statically typed and cached in design-time
to run through a known process with straightforward (and ideally asynchronous) transformations
to take that transformed data and load it into a known destination also with known metadata
It's fine to have parameterized data sources that bring back different data. But to have them bring back entirely different metadata each time with no congruity between the different sets is, frankly, ridiculous, and I'm not entirely sure I want to know how you handled all your column metadata in the working 2008 package.
This is why it wants you add a WITH RESULTS SET to the SSIS query - so it can generate some metadata. It doesn't do this at runtime - it can't! It has to have a known set of columns (because it aliases them all into compiled variables anyway) to work with. It expects the same columns every time it runs that Data Flow Task - the exact same columns, down to the names, the types, and the constraints.
Which leads to one (terrible, terrible) solution - just stick all the data into a temporary table with Column1, Column2 ... ColumnN and then use the same variable you're using as the table name parameter to conditionally branch your code and do whatever you want with the columns.
Another more sane solution would be to create a data flow task for each of your source tables, and use your parameter in a precedence constraint to just pick which data flow task should run.
For a solution this poorly tailored for an out-of-the-box ETL, you should also highly consider just rolling your own in C# or a script task instead of the Data Flow Task provided by SSIS.
In short, please don't do this. Think of the children (packages)!
I've used CozyRoc Dynamic DataFlow Plus to achieve this.
Using configuration tables to build the SQL Select statements, I have a single SSIS package that loads data from Oracle and Sybase (or any OLEDB source) to MS SQL. Some of the result sets are in the millions of rows and performance is excellent.
Instead of writing a new package every time a new table is needed, this can be configured in minutes and run on a the pre-tested and robust existing package.
Without it I would have been up for writing hundreds of packages.

How do I pass system variable value to the SQL statement in Execute SQL task?

SSIS 2008. Very simple task. I want to retrieve a System Variable and use it in an SQL INSERT. I want to retrieve the value of System:MachineName and use it in an insert statement.
Using the statement INSERT INTO MYLOG (COL1) SELECT #[System::MachineName] gives the error Error: ..failed to parse. Must declare the scalar variable "#"
Using the statements SELECT #System::MachineName or SELECT ##[System::MachineName] gives the error 'Error Incorrect systax near '::'
I am not trying to pass a parameter to the query. I have searched for a day already but couldn't find how to do this one simple thing!
Here is one way you can do this. The following sample package was created using SSIS 2008 R2 and uses SQL Server 2008 R2 as backend.
Create a sample table in your SQLServer database named dbo.PackageData
Create an SSIS package.
On the SSIS, add an OLE DB connection manager named SQLServer to connect to your database, say to an SQL Server database.
On the Control flow tab, drag and drop an Execute SQL Task
Double-click on the Execute SQL task to bring the Execute SQL Task Editor.
On the General tab of the editor, set the Connection property to your connection manager named SQLServer.
In the property SQLStatement, enter the insert statement INSERT INTO dbo.PackageData (PackageName) VALUES (?)
On the Parameter Mapping tab, click Add button, select the Package variable that you would like to use. Change the data type accordingly. This example is going to insert the PackageName into a table, so the Data Type would be VARCHAR. Set the Parameter Name to 0, which indicates the index value of the parameter. Click OK button.
Execute the package.
You will notice a new record inserted into the table. I retained the package name as Package. That's why the table
Hope that helps.
Per my comment against #ZERO's answer (repeated here as an answer so it isn't overlooked by SSIS newcomers).
The OP's question is pretty much the use case for SSIS property expressions.
To pass SSIS variables into the query string one would concatenate it into an expression set for the SqlStatementSource property:
"INSERT INTO MYLOG (COL1) SELECT " + #[System::MachineName]
This is not to suggest the accepted answer isn't a good pattern, as in general, the parameterised approach is safer (against SQL injection) and faster (on re-use) than direct query string manipulation. But for a system variable (as opposed to a user-entered string) this solution should be safe from SQL injection, and this will be roughly as fast or faster than a parameterised query if re-used (as the machine name isn't changing).
I never use it before but maybe you can check out the use of expression in Execute SQL task for that.
Or just put the whole query into an expression of a variable with evaluateAsExpression set to true. Then use OLE DB to do you insert
Along with #user756519's answer, Depending on your connection string, your variable names and SQLStatementSource Changes