Execute SQL step in pentaho - pentaho

I have created transformation which includes table input,sql step and excel o/p step.
Table input-->Run a query and get the field "query" which includes sql query select * from dual
Execute sql step-->Dynamically passing that query field using '?' and enabling variable substitution
Excel o/p-Expecting o/p is the sql query should be triggered and get the result in excel o/p
But i can't get the fiels from execute sql step.. How i can do this???
Thanks
Kavitha S

Use Database join instead of Execute SQL step. The Database Join step allows you to run a query against a database using data obtained from previous steps.
Database join Input: You can pass any of data you want from previous step using ? notation in SQL query defined inside the step.
Database join Output: Executes parametrized SQL query and adds new parameters as an output.
The step is what you need for your 2nd step. See more info about the Database join step in the documentation.

In PDI, "Execute SQL Step" is not meant for generating rows. It will not add any extra row to the data stream. You got Table Input step to generate multiple rows.
What you can try as an alternative is to break the transformation into two parts.
Part 1: Table Input Step > (query rows are generated) >> Use "Set variables" or "copy rows to result" to some other steps to set the query into some variable e.g: query.
Part 2: Take another Table Input Step (into a next .ktr file) and use the variable substitution of ${query} >> Finally output the result set to the excel output.
For dynamically sql queries, you can read this blog.
In case you have some lookups to do with the query generated, you can use Dynamic SQL row to generate the rows.
Hope it helps :)

Related

SSIS save value as a parameter

I am using SELECT UpdateDate FROM dbo.log command in a execute sql task. I'm fairly new to this so please bear with me. I want to store the value as a variable then pass that into the where clause of a subsequent data flow. My questions are:
What is the correct way to setup the Execute SQL Task. In General I have the OLE DB Connection and direct input with the query above. Result Set is set to Single row and then I am storing this to a variable I have created called User:: UpdateDate. For some reason this doesn't work?
I then want to call this date in a data flow. ie. SELECT * FROM Users WHERE RecordDate > User::UpdateDate. I believe the syntax is different for this.
I would really appreciate some help with this. Many thanks in advance
In your Execute SQL Task Editor, configure the Parameter Mapping as shown below, obviously use your own variable, in this example I'm using PackageStartTime.
Then in your SQL statement, use below:
SELECT * FROM Users WHERE RecordDate > ?
To save value from a SQL Statement, you will need to set the Result Set to single row and configure result set as shown in the example below:
Execute SQL Task with ResultSet
First of all, create a variable of type System.Date example: #[User::UpdateDate].
Add an Execute SQL Task select the OLEDB connection and use the following command as SQL Statement:
SELECT TOP 1 UpdateDate FROM dbo.log
Set the ResultSet property to Single Row and in the ResultSet Tab add a Row with the following values:
ResultName = 0 (which means the first column)
VariableName = #[User::UpdateDate]
Additional Information
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
OLEDB Source with parameterized SQL Command
Inside the Data Flow Task, add an OLEDB Source, select the Access Mode to SQL Command. And write the following command:
SELECT * FROM Users WHERE RecordDate > ?
Click on the Parameters button and map the variable #[User::UpdateDate] as the first parameter.
Additional Information
Map Query Parameters to Variables in a Data Flow Component
Parameterized OLEDB source query

Pentaho Transformation "Execute SQL Statements" vs "Table Input" step

I am new to PDI/Kettle.I need to execute simple SQL select similar to "SELECT EMPID FROM Employee" and need to write the output to "Microsoft Excel Output" as part of reports generation.
When I write this query in "Execute SQL Statement" step under "SQL Scripts to execute" section in my transformation and execute it , it is returning nothing but transformation got completed without any errors.
No result written in my output file. The same behavior with the step "Execute Row SQL Script" by reading the input from sql file/data grid with query as input.
Transformation flow:
EXECUTE SQL STATEMENTS >> Microsoft Excel Output
EXECUTE SQL
STATEMENTS >> Textfile Output
If I use "Table Input" step and write the query under "SQL" section , it is getting executed and giving the result.
Table Input >> Microsoft Excel Output
Table Input >> Textfile Output
Can anyone help me in understanding this behavior and context/use cases of these steps.
Thank you techies for your knowledge share on this .
As per my understanding ,"Execute SQL statement" step is used to execute SQL statements like DDL, DML but it won't give any result to output stream except number of records impacted/affected(statistics) when we execute DML statements.
To track this statistics, there were optional fields give like insert stats, update stats, delete stats and read stats and based on your DML statement we can give the field name and number of records affected will be written as a value to that field. This can be noticed in "Preview data" under Transformation Execution results.
The Execute SQL Statement does not provide any result. Its purpose is DDL (Data Definition Language) to drop/create/truncate/alter tables, and DML (Data Manipulation Language) insert/update/delete rows.
Two checks (among other) should become a second nature after the coding of every step:
Check the output columns (right click on the step, choose Output fields).
Make a preview of the results (right click on the step, choose Preview).
Let me explain one basic concept of pentaho pdi (kettle): all the actions on kettle happens with a row. If there is no row, there will be no action. So if you add a generate row step at the begining of your transformation, with one dummy row with some value you will see how your sql statement will be triggered.
At a glance pentaho works with this 2 premises:
1 Everything is an asyncronous flow
2 Every action happens at row level. (no row, no action)
an input table step generates rows but an execute sql statement is not a input step type, is a transform step and expects rows generated already before this step.
I think this two basics concepts can help to understand how ketle works.

How to run a select sql statement within a field in the Pentaho?

I have a table with a 'query' field containing a select sql and another 'parameters' field containing the sql parameters. I have merged these two fields into a new field containing a correct select sql statement. Now I need to execute this new field containing select sql, get the return from select (the output fields) and generate an excel file.
Use Table-Input if you are interested in a query result set. Table-Input supports SQL parameters, so no need to build the statement yourself using e.g. Replace-In-String, and tripping over escapes on your way. Also, there's variable substitution, just in case you can't live with a single template.
Update 21:14 GMT
I'm not very fond of the way you try to prepare the SELECT statement, but here we go, assuming it's a single statement we have:
Create a job with a Start entry and 2 Transformation entries (T1, T2). Let T1 produce the field containing your SELECT statement and use a Set-Variables step to make the statement available to T2 as variable SELECT. In T2 use a Table-Input step referencing ${SELECT} in the SQL statement text area. Don't forget to enable option "Replace variables in script".
From now on it's a matter of taste. I would prefer to create a CSV file using Text-File-Output. Using the right field separator Excel will open the file after double-clicking it. The advantage of Text-File-Output is that you don't have to specify the fields you don't know at design-time anyway. An empty field list will just handle all fields coming in. Comparable to the total projection in a Table-Input which will create the necessary fields from the retrieved columns downstream.
If you must produce an Excel workbook, you'll have to learn about metadata injection. That would be a separate project for a beginner, though. There are samples in your Kettle installation folder. And there is a very active community if you find yourself in trouble.

How to run a sql within a field in the Pentaho?

I have a table with a 'query' field containing a select sql and another 'parameters' field containing the sql parameters. I have already merged these two fields into a new field containing a correct select sql statement. Now I need to execute this new field containing select sql, get the return from select (the output fields) and generate an excel file.
In your transformation, you can use the Execute row SQL script (under scripting), which does exactly what you want, executing the SQL contained in the selected input field for each row passed to the step.
Please do consider if that's really what you want. It works fine for a small set or complicated logic, but is very inefficient compared to batch inserts and updates.
If your SQL statements do look like "INSERT [some data]" and you have many rows to process, consider splitting the streams with filter steps and applying calculations or constants to each case to set the values correctly, then directing all to a table output step.

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.