Pentaho kettle : how to execute "insert into ... select from" with the sql script step? - pentaho

I am discovering Pentaho DI and i am stuck with this problem :
I want to insert data from a csv file to a custom DB, which does not support the "insert table" step. So i would like to use the sql script step, with one request :
INSERT INTO myTable
SELECT * FROM myInput
And my transformation would like this :
I don't know how to get all my data from the csv to be injected in the "myInput" field.
Could someone help me ?
Thanks a lot :)

When you first edit the SQL Script step, click 'Get fields' button. This is going to load the parameters(fields from your csv) into box on the bottom left corner. Delete the parameters(fields) you don't want to insert.
In your sql script write your query something like this where the question marks are your parameters in order.
insert into my_table (field1,field2,field3...) values ('?','?','?'...);
Mark the checkboxes execute for each row and execute as a single statement. That's really about it. Let me know if you have any more questions and if you provide sample data I'll make you a sample ktr file to look at.

I think you get the wrong way. You should get a cvs file input step and a table output step.
As rwilliams said, In cvs file input step Get fields; the more important, in table output there is a Database Fields tab. Enter field mapping is right choise.The guess function is amazing.
And more, system can generate target table create sql statement when target table not exists in target connection Db server.

Use the following code
with cte as
(
SELECT * FROM myInput
)
select *
into myTable
from cte;

Related

SSIS save value as a parameter

I am using SELECT UpdateDate FROM dbo.log command in a execute sql task. I'm fairly new to this so please bear with me. I want to store the value as a variable then pass that into the where clause of a subsequent data flow. My questions are:
What is the correct way to setup the Execute SQL Task. In General I have the OLE DB Connection and direct input with the query above. Result Set is set to Single row and then I am storing this to a variable I have created called User:: UpdateDate. For some reason this doesn't work?
I then want to call this date in a data flow. ie. SELECT * FROM Users WHERE RecordDate > User::UpdateDate. I believe the syntax is different for this.
I would really appreciate some help with this. Many thanks in advance
In your Execute SQL Task Editor, configure the Parameter Mapping as shown below, obviously use your own variable, in this example I'm using PackageStartTime.
Then in your SQL statement, use below:
SELECT * FROM Users WHERE RecordDate > ?
To save value from a SQL Statement, you will need to set the Result Set to single row and configure result set as shown in the example below:
Execute SQL Task with ResultSet
First of all, create a variable of type System.Date example: #[User::UpdateDate].
Add an Execute SQL Task select the OLEDB connection and use the following command as SQL Statement:
SELECT TOP 1 UpdateDate FROM dbo.log
Set the ResultSet property to Single Row and in the ResultSet Tab add a Row with the following values:
ResultName = 0 (which means the first column)
VariableName = #[User::UpdateDate]
Additional Information
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
OLEDB Source with parameterized SQL Command
Inside the Data Flow Task, add an OLEDB Source, select the Access Mode to SQL Command. And write the following command:
SELECT * FROM Users WHERE RecordDate > ?
Click on the Parameters button and map the variable #[User::UpdateDate] as the first parameter.
Additional Information
Map Query Parameters to Variables in a Data Flow Component
Parameterized OLEDB source query

Transition from MS SQL to Pentaho Kettle

I have a few ms SQL scripts which I would like migrate to kettle. Ideally what I would like to do is for each step of the script to be a single step in kettle. But I am finding it difficult to wrap my head around the ms SQL statements and the related kettle step. Could someone please elaborate on which kettle step which can be used to do the following:
select * from [table] - This one is obviously [Input->Table input]
ALTER TABLE [table] ADD [fieldname] [nvarchar](255)
UPDATE b SET b.b_field = a.a_field
FROM [table_a] a
INNER JOIN [table_b] b
ON right(b.b_identity,19)=a.a_identity
where b.b_field is null
Step 3 is repeated with many other different tables with different fields compared.
Thank you.
You can't simply translate it step by step. Replace the functionality, but you can't simply map SQL steps to PDI steps. It's a completely different paradigm.
As quick and dirty way to migrate SQL scripts to Kettle, you have the SQL Execute script step, in which you can copy/paste your script as is.
Still on the quick and dirty way, note that you can put more than one statement in the Table Input, provided they are separated by coma. You can even create temporary table with SELECT INTO, index them, and read from them.
But obviously this is not really clean. For (2), you can produce a flow containing the table-name and field-name, then use a Javascript step to write down a column containing the text "ALTER TABLE [table-name] ADD [field-name] NVARCHAR(255)", then a Dynamic SQL row to execute that statement for each input line.
For (3) the principle is to create the input flow with a Table Input with a "SELECT a.a_field FROM [table_a] a INNER JOIN [table_b] b ON RIGHT(b.b_identity,19)=a.a_identity". And then to update table_b with an Update step. I cannot really help there since I do not see the b-key for the update.
When this is done and tested for one table and one field, you can put these values in parameters, and use a Job to loop over the parameters.
You have an example of this use case in the sample directory which was shipped with your distribution. It sits in the same folder as your spoon.bat, and the job of interest to you is samples/transformations/dynamic-table/Dynamic table creation and population.kjb.

How can I link values in a text file to values in a SQL Server table?

UPDATE 2
Essentially, I'd like to pass the dataTable into SQL Server, but not actually "create it", just like a temporary thing using:
cmd.CommandText = "SELECT *
FROM table1
INNER JOIN dataTable
ON sample.name = dataTable.name"
How can I pass the dataTable from vb.net (.NET 2.0) to SQL Server in a similar fashion?
UPDATE 1
So I'm thinking maybe passing the data from the text file into a datatable and using that to compare against the SQL Server table? How would I go about doing that if at all possible?
ORIGINAL POST
I have a table in SQL Server 2012 (i.e., dbo.sample1) within the table there is a column that contains names (i.e., abc01, abc02, abc03, hijk01, hijk02...)
I ran some vb code to extract certain file names without extension from a directory on my machine (i.e., abc01, abc02...) that met certain conditions, these file names were saved on separate lines within a text file.
Is there any easier convenient way of linking ONLY the names on my text file to the ones on my table so as not to show any rows that are not on the text file? I figured I can sit and plug in a bunch of WHERE name = 'abc01'... but didn't really want to site there and do that for all of the names I have. But I'm not sure is this might even work correctly as I need to do an INNER JOIN on of 2 tables in the DB to the values in the text file.
If this is a long problem, then please point me in the right direction and I can research and move forward with it, but any help is greatly appreciated, thanks!
If the list of filenames is reasonable you can us the IN() clause:
SELECT * from XXX WHERE name IN ('abc01', 'abc02', 'abc03',...)
If the list is too long look into bulk copy up to a temp table to use with a JOIN.

Execute SQL step in pentaho

I have created transformation which includes table input,sql step and excel o/p step.
Table input-->Run a query and get the field "query" which includes sql query select * from dual
Execute sql step-->Dynamically passing that query field using '?' and enabling variable substitution
Excel o/p-Expecting o/p is the sql query should be triggered and get the result in excel o/p
But i can't get the fiels from execute sql step.. How i can do this???
Thanks
Kavitha S
Use Database join instead of Execute SQL step. The Database Join step allows you to run a query against a database using data obtained from previous steps.
Database join Input: You can pass any of data you want from previous step using ? notation in SQL query defined inside the step.
Database join Output: Executes parametrized SQL query and adds new parameters as an output.
The step is what you need for your 2nd step. See more info about the Database join step in the documentation.
In PDI, "Execute SQL Step" is not meant for generating rows. It will not add any extra row to the data stream. You got Table Input step to generate multiple rows.
What you can try as an alternative is to break the transformation into two parts.
Part 1: Table Input Step > (query rows are generated) >> Use "Set variables" or "copy rows to result" to some other steps to set the query into some variable e.g: query.
Part 2: Take another Table Input Step (into a next .ktr file) and use the variable substitution of ${query} >> Finally output the result set to the excel output.
For dynamically sql queries, you can read this blog.
In case you have some lookups to do with the query generated, you can use Dynamic SQL row to generate the rows.
Hope it helps :)

Generating sql insert into for Oracle

The only thing I don't have an automated tool for when working with Oracle is a program that can create INSERT INTO scripts.
I don't desperately need it so I'm not going to spend money on it. I'm just wondering if there is anything out there that can be used to generate INSERT INTO scripts given an existing database without spending lots of money.
I've searched through Oracle with no luck in finding such a feature.
It exists in PL/SQL Developer, but errors for BLOB fields.
Oracle's free SQL Developer will do this:
http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html
You just find your table, right-click on it and choose Export Data->Insert
This will give you a file with your insert statements. You can also export the data in SQL Loader format as well.
You can do that in PL/SQL Developer v10.
1. Click on Table that you want to generate script for.
2. Click Export data.
3. Check if table is selected that you want to export data for.
4. Click on SQL inserts tab.
5. Add where clause if you don't need the whole table.
6. Select file where you will find your SQL script.
7. Click export.
Use a SQL function (I'm the author):
https://github.com/teopost/oracle-scripts/blob/master/fn_gen_inserts.sql
Usage:
select fn_gen_inserts('select * from tablename', 'p_new_owner_name', 'p_new_table_name')
from dual;
where:
p_sql – dynamic query which will be used to export metadata rows
p_new_owner_name – owner name which will be used for generated INSERT
p_new_table_name – table name which will be used for generated INSERT
p_sql in this sample is 'select * from tablename'
You can find original source code here:
http://dbaora.com/oracle-generate-rows-as-insert-statements-from-table-view-using-plsql/
Ashish Kumar's script generates individually usable insert statements instead of a SQL block, but supports fewer datatypes.
I have been searching for a solution for this and found it today. Here is how you can do it.
Open Oracle SQL Developer Query Builder
Run the query
Right click on result set and export
http://i.stack.imgur.com/lJp9P.png
You might execute something like this in the database:
select "insert into targettable(field1, field2, ...) values(" || field1 || ", " || field2 || ... || ");"
from targettable;
Something more sophisticated is here.
If you have an empty table the Export method won't work. As a workaround. I used the Table View of Oracle SQL Developer. and clicked on Columns. Sorted by Nullable so NO was on top. And then selected these non nullable values using shift + select for the range.
This allowed me to do one base insert. So that Export could prepare a proper all columns insert.
If you have to load a lot of data into tables on a regular basis, check out SQL Loader or external tables. Should be much faster than individual Inserts.
You can also use MyGeneration (free tool) to write your own sql generated scripts. There is a "insert into" script for SQL Server included in MyGeneration, which can be easily changed to run under Oracle.