SSIS execute sql task parameter mapping - sql

I'm trying to execute a sql script using the task in SSIS.
My script just inserts a bunch of nambe value pairs in a table. For example -
insert into mytable (name, value)
values
(?, 'value1'),
(?, 'value2')
Now, I want to map a variable defined in SSIS to be mapped to the parameters in the statement above. I tried defining a scalar variable but I guess the sql task doesn't like that. Oh and all the name parameters in the insert statement resolve to a single variable.
For example I want
insert into mytable (name, value)
values
('name1', 'value1'),
('name1', 'value2')
When I open the Parameter Mapping tab for the task, it wants me to map each parameter invidually like -
Variable Name - User::Name
Direction - Input
Data Type - LONG
Parameter Name - 0
Parameter Size - -1
Variable Name - User::Name
Direction - Input
Data Type - LONG
Parameter Name - 1
Parameter Size - -1
This quickly gets out of hand and cumbersome if have 5-10 values for a name and forces me to add multiple assignments for the same name.
Is there an easy(-ier) way to do this?

The easiest (and most extensible) way, is to use a Data Flow Task instead of using an Execute SQL Task.
Add a Dataflow Task; I assume that you have all the variables filled with the right parameters, and that you know how to pass the values onto them.
Create a dummy row with the columns you will need to insert, so use whatever pleases you the most as a source (in this example, i've used an oledb connection). One good tip is to define the datatype(s) of each column in the source as you will need them in your destination table. This will align the metadata of the dataflow with the one the insert table (Screenshot #1).
Then add a multicast component to the dataflow.
For the first parameter/value, add a derived column component, name it cleanly and proceed to substitute the content of your parameters with your variables.
For each further parameter/value that needs to be added; copy the previously created derived column component, add one extra branch from the multicast component and proceed to substitute the column parameter/value as necessary.
Add a union all and join all the flows
Insert into the table
VoilĂ ! (Screenshot #2)
The good thing about this method is that you can make as extensible as you wish... validate each value with different criteria, modify the data, add business rules, discard non-compliant values (by checking the full number of complying values)... !
Have a nice day!
Francisco.
PS: I had prepared a couple more screenshots... but stackoverflow has decided that I am too new to the site to post things with images or more than two links (!) Oh well..

Related

How to force error or stoppage in SSIS data flow during the debugging process?

The problem is: i make left join for two tables and then i need to load whole data to another table but only if every row from first table has match in second one, so, to cut it short, there is no NULLs in one exact column.
If there is at least one null i want to fail my data flow so it'll not load any data to final table and then send an email with the error by executing sql task.
After many tries i can only make errors if there is nulls but this error are not fatal. How can i raise fatal error not using smth stupid like data conversion which can't be done? I was trying to make breakpoint after some variable is changed but was defeated by ssis(
If I understand correctly, the Data Flow loads data to Table1. The Execute SQL Task uses Table1 to populate Table2.
The business rule is that the Execute SQL Task should only fire if a column from the previous data load had no NULLs.
The lazy way to handle this, is to put the logic in the query itself. Something like the following and yes, there are ways to optimize this
INSERT INTO dbo.Table2 SELECT * FROM dbo.Table1 WHERE NOT EXISTS (SELECT * FROM dbo.Table1 WHERE MyColumn IS NOT NULL)
To make this happen only in SSIS,
Add a Variable to the Package called NullRowCount and initialize it to zero.
In the Data flow, add a Multicast between the Join and the Destination. Route one path to the destination
In the Data flow, connect a Conditional Split to a new path from the Multicast. Configure the Conditional Split to have have an Output Name of "No Data" and use an expression like IsNull([MyColumn]). That's a boolean, yes/no.
In the Data flow, add a Row Count transformation to the Conditional Split and attach it to the "No Data" pipe (the Default pipe will contain rows that have values in MyColumn). Use #[User::NullRowCount] in the Row Count transformation.
Finally, double click the precedent constraint you have between the Data Flow and the Execute SQL Task. Make it back to an On Success constraint and then change the evaluation option from Constraint to Constraint and Expression. Here, we'll use an expression of #[User::NullRowCount] == 0
In plain English, we are going to have the data flow count how many rows in our set have a NULL in MyColumn. The Precedent Constraint will allow/disallow the Execute SQL Task to run and the criteria we specify are that the data flow had to run successfully and the count of rows with NULL in it is zero.
If say you wanted to have an action when the count is non-zero (send email or other alert), then you would add another Task and configure it with Expression and Constraint but now use an expression of #[User::NullRowCount] > 0
Based on the comment
may be i can stop it (force an error) inside the data flow before loading data in data source? because this sql text sends an email, so i want all etl process to be done in one data flow
No, not really. Assuming you changed out the Row Count in the above with a Script Task that explicitly raises an error or a Derived Column Task that forces a divide by zero - either of those would interrupt a data flow, but you don't know whether it was the first row of the data flow that caused the exception or the one billionth. In the later case, the data has already flowed into the destination (unless you have the commit size of 0 which can lead to other issues) and you have partially loaded data.
Ultimately, you need to preprocess your data to identify if there's data that does not conform to expectation. I would make the above changes - count if you have any bad data but instead of landing data into a table, land it into a Raw File. A raw file is a compact binary of the data so yes, you'll pay disk IO penalty but it will save you reprocessing the data if it's valid.
Then you add a new Data Flow Task that only works when you have a zero null count, using the precedent constraint approach described above. This new data flow is just Raw File Source to "original destination." Now you'll get a clean separation of data landing in your table only if pristine and not have to worry about partial loads.

Loop for Pentaho where I redefine a variable on each execution

I have an excel with 300 rows. I need to use each of these rows as a field name in a transformation.
I was thinking of creating a job that for each row of a table sets a variable that I use afterwards on my transformation.
I tried defining a variable as the value I have in one row and the transformation works. Now I need a loop that gets value after value and redefines the variable I created then executes the transformation.
I tried to define a Job that has the following:
Start -> Transformation(ExcelFileCopyRowsToResult) -> SetVariables -> Transformation(The transf that executes using whatever the variable name is at the moment).
The problem is that the variable I defined never changes and the transformation result is always the same because of that.
Executing a transformation for each row in a result set is a standard way of doing things in PDI. You have most of it correct, but instead of setting a variable (which only happens once in the job flow), use the result rows directly.
First, configure the second transformation to Execute for each row in the Edit window.
You can then use one of two ways to pass the fields into the transformation, depending on which is easier for you:
Start the transformation with a get rows from result. This should get you one row each time. The fields will be in stream directly and can be used as such.
Pass the fields as parameters, so they can be used like variables. I use this one more often, but it takes a bit more setup.
Inside the second transformation, go to the properties and enter variable names you want in the Parameters tab.
Save the transformation.
In the job, open the transformation edit window and go to Parameters.
Click Get Parameters.
Type the field name from the first transformation under Stream Column Name for each parameter.

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

How to put the values of a column of an internal table into a variant?

Can somebody help me figure out if there is a way for the query below:
I have an internal table with one column with 69 records.
I want all these 69 records to be populated into a variant and get saved so that with this variant and the values saved in it I can run a particular program
How can I populate these values ?
Your question is a bit unclear for me.
Do you speak about two programs or one program?
What's the parameter you want to fill in the variant?
I'll just give you some hints - depending on your situation you must pick the correct parts.
I have an internal table with one column with 69 records.
How is the internal table filled?
I want all these 69 records to be populated into a variant and get saved so that with this variant and the values saved in it I can run a particular program
You have a program and you want to save a selection in a variant. So you need some parameters for the selection screen.
You want a table, so you need a SELECT-OPTION.
To define a SELECT-OPTION you need a DDIC-reference (you must say, what kind of field you want.). In the following example I use a material number (MARA-MATNR).
So you program contains something like:
TABLES mara.
SELECT-OPTIONS: s_matnr FOR mara-matnr.
With this you would get:
You can define ranges (from-to) and list of values. As you want only single values, you need something like:
SELECT-OPTIONS: s_matnr FOR mara-matnr NO INTERVALS.
Now you get:
When you push (1) you can enter values.
With (2) you can load from an external file,
with (3) you can load values from clipboard.
So you can fill your values and store the selection in a variant.
When you execute your program, the data is stored in a ranges table:
Now you can loop on this table and copy the S_MATNR-LOW value into your internal table for further processing.
If I misunderstood you question and you want to create a variant dynamically, then take a look on function module RS_VARIANT_ADD (or RS_VARIANT_COPY,RS_VARIANT_CHANGE...)
You could always put the values in TVARVC, either manually or via code. Then specify the TVARVC variable in the variant definition.

SSIS external metadata column needs to be removed

I am creating a select statement on the fly because the column names and table name can change, but they all need to go into the same data destination. There are other commonalities that make this viable, if I need to later I will go into them. So, what it comes down to is this: I am creating the select statement with 16 columns, there will always be sixteen columns, no more, no less, the column names can change and the table name can change. When I execute the package the select statement gets built just fine but when the Data Flow tries to execute, I get the following error:
The "external metadata column "ColumnName" (79)" needs to be removed from the external metadata column collection.
The actual SQL Statement being generated is:
select 0 as ColumnName, Column88 as CN1, 0 as CN2, 0 as CN3, 0 as CN4,
0 as CN5, 0 as CN6, 0 as CN7, 0 as CN8, 0 as CN9, 0 as CN10,
0 as CN11, 0 as CN12, 0 as CN13, 0 as CN14, 0 as CN15 from Table3
The column 'Column88' is generated dynamicly and so is the table name. If source columns exist for the other ' as CNx' columns, they will appear the same way (Column88 as CN1, Column89 as CN2, Column90 as CN3, etc.) and the table name will always be in the form: Tablex where x is an integer.
Could anyone please help me out with what is wrong and how to fix it?
You're in kind of deep here. You should just take it as read that you can't change the apparent column names or types. The names and types of the input columns become the names and types of the metadata flowing down from the source. If you change those, then everything that depends on them must fail.
The solution is to arrange for these to be stable, perhaps by using column aliases and casts. For one table:
SELECT COLNV, COLINT FROM TABLE1
for another
SELECT CAST(COLV AS NVARCHAR(50)) AS COLNV, CAST(COLSMALL AS INTEGER) AS COLINT FROM TABLE2
Give that a try and see if it works out for you. You just really can't change the metadata without fixing up the entire remainder of the package.
I had the same issue here when I had to remove a column from my stored procedure (which spits out to a temp table) in SQL and add two columns. To resolve the issue, I had to go through each part of my SSIS package from beginning (source - in my case, pulls from a temporary table), all the way through to your destination (in my case a flat file connection to a flat file csv). I had to re-do all the mappings along the way and I watched for errors that game up in the GUI data flow tasks in SSIS.
This error did come up for me in the form of a red X with a circle around it, I hovered over and it mentioned the metadata thing...I double clicked on it and it warned me that one of my columns didn't exist anymore and wanted to know if I wanted to delete it. I did delete it, but I can tell you that this error has more to do with SSIS telling you that your mappings are off and you need to go through each part of your SSIS package to make sure everything is all mapped out correctly.
How about using a view in front of the table. and calling the view as the SSIS source. that way, you can map the the columns as necessary, and use ISNULL or COALESCE functions to keep consistent column patterns.