How to Pass parameter in table input? - pentaho

I have one job with two transformation in it.
Transformation get list of data which is pass to another transformation. Here it execute for each row pass from first transformation.
In second transformation I have used
"get row from result" -> "table input"
In "get row from result" there are five field but in table input i have to use only 2th position and 3th position field.
even if i try to give single param "?" its giving error
"
2017/06/29 15:11:02 - Get Data from table.0 - Error setting value #3 [String] on prepared statement
2017/06/29 15:11:02 - Get Data from table.0 - Parameter index out of range (3 > number of parameters, which is 2).
"
My query is very simple
select * from table where col1= ? and col2 = ?
How can I achieve this? error? Is my doing anything wrong ?

You can also give names to your parameters, so that your query become
select * from table where col1="${param2}" and col2="${param3}".
Don't forget to check the "Replace variable in script" checkbox, and to adapt the quotes to your sql dialect (ex: '${param1}' for SQL-Server).
Note the param2 and param3 must exists in the transformation's Settings/Parameters, without the ${...} decoration and with values that don't break the SQL.
The values of the parameters can be set or changed in a previous transformation with a Set variables step (variables and parameters are synonymous in first approximation) and a scope at least Valid in the parent job.
Of course, if you insist in unnamed parameters for legacy purposes or any other reason, you are responsible to tell PDI that the first one is to be discarded, (eg where (? is null or 0=0) and col1=? and col2=?.

If you have 5 fields arriving to the table input, you need to pass 5 parameters to your query, and in the right order. Also, each parameter will be used only once.
So, if you have 5 fields and only use 2 of them, the best way is to put a select values step between Get rows from result and Table input and let only the actual query parameters through.

Related

Injection safe "SELECT FROM table WHERE column IN list;" using python3 and sqlite3

I want to take a user input from an html form, and do a SELECT by matching a column to the user input, and be safe for injection. BUT I want the user input to be a comma separated list. For example, if the column is called "name" and user_input is "Alice,Bob,Carrol" I wan to execute the query
SELECT FROM table WHERE name IN ("Alice","Bob","Carrol");
Which means I have the same problem as in this question select from sqlite table where rowid in list using python sqlite3 — DB-API 2.0.
But of course I do not want to do string concatenation myself to avoid injections. At the same time, because there could be any number of commas in user_input, I cannot do this:
db.execute('SELECT FROM table WHERE name IN (?,?,?)', user_input_splited)
I looked for a way to sanitize or escape the input by hand, to be able to do something like that:
db.execute('SELECT FROM table WHERE name IN ?', user_input_sanitized)
But I didn't find it. What's the best approach here?
Write your own code to take the user's input, split() it by comma, then iterate through that list. As you accept each value, push it onto one array, and push a literal "?" onto the other.
Of course, now verify that you have at least one acceptable value.
Now, construct your SQL statement by to include, say, join(", ", $qmarks_array) to automatically construct a string that looks like ?, ?, ? ... with an appropriate number of question-marks.(It won't insert any comma if there's only one element.) Then, having constructed the SQL statement in this way, supply the other array as input to the function which executes that query.
In this way, you supply each value as a parameter, which you should always do, and you allow the number of parameters to vary.

IBM SQL Next Value for issue

I am trying to write a simple query to get a sequence number.
EXEC SQL SELECT NEXT VALUE FOR #SOP_SEQ INTO :SEQ ;
EXEC SQL SELECT NEXT VALUE FOR #SOP_SEQ INTO :SEQ FROM #SOP_SEQ;
With the first line of code, I get an error message before I can even compile: SQL0104 Token was not valid. Valid tokens: , FROM
I tried the second line of code and I get this error when I compile:
SQL1103 Position 57 Column definitions for table #SOP_SEQ in *LIBL not found.
Can someone point to me what I am doing wrong?
SELECT ... INTO needs a row to run against, and you are not providing any, thus you have no result set.
There are two ways to do what you want.
Using SELECT INTO with SYSDUMMY1
select next value for #sop_seq
into :seq
from sysibm/sysdummy1;
Or, better, using VALUES INTO which does not need the reference to SYSDUMMY1
values next value for #sop_seq
into :seq;
tldr;
SYSIBM/SYSDUMMY1 is a catalog file with a single record, and before VALUES INTO became available, was commonly used to retrieve calculated values into a result set when a single row is required, and there is no real table reference that applies (as in your situation here). This technique is still used, but I would advise toward using VALUES INTO instead as no artificial FROM clause is necessary.

How to filter rows by length in Kettle

I'm using a row filter to filter out columns that are longer than given length. Under filter conditions there are no conditions for checking row length.
So the workaround is to use:
Field1 REGEXP [^.{0,80}$]
OR
Field1 IS NULL
Field2 REGEXP [^.{0,120}$]
OR
Field2 IS NULL
Length check is a very common requirement. Is there a function/simpler way to do this that I'm missing?
Use Data Validator step:
Create a new validation for every column you want to check and set "Max string length" for every validation created.
You can redirect erroneous rows using "Error handling of step" hop:
By default these rows have same structure and values as the input rows, but you can also include additional information, such as the name of the erroneous column or error description.
Alternatively, you can compute a string length before filtering using calculator step, but it may create a lot of additional columns if you have multiple columns to check.
And, of course, you can always perform such checks in User Defined Java Class or Modified Java Script Value.
Assuming you are talking about strings, you can use a Calculator step with the somewhat hard to find calculation "Return the length of a string A". That will give you the values for your Filter Rows step.

Pentaho Kettle (PDI) table input step with field substitution running slower than using literal

I'll go straight to the point. I have a table input step, which reads records with a query that includes a where clause, as follows:
SELECT * id, name, surname, creation_date
FROM users
WHERE creation_date > ?
If a put a literal (i.e. '2017-04-02T00:00:00.000Z') in palce of the question mark, this step reads all new values, which could be thousands, in millis. If I use the field substitution and use the incoming value, it takes minutes.
Do you know why this could be happening? Do you know how to solve the issue?
Thank you very much for your time.
I found a workaround, not a solution for this particular issue, but it works: instead getting the value from the previous step and use field substitution (? in the query), I read the value in a previous transformation in the job, store it in the variables space, and read it from there using variable substitution ('${variable_name}' in the query), it works just as fast as if the value were hardcoded.

pentaho database join error to match input data

I have an input.csv file in which I have field "id" .
I need to do a database lookup with below logic.
I need to search whether the "id" is present in the field "supp_text"and extract the field "loc_id".
Eg,
id = 12345.
and, in my supp_text, I have the value "the value present is 12345".
I am using "Database join" function to do this.
viz.
*select loc_id from SGTABLE where supp_text like '%?%';*
and, i am passing "id" as a parameter.
I get the below error when I run.
"Couldn't get field info from [select LOC_ID from SGTABLE WHERE SUPP_TEXT like '%?%']"
offending row : [ID String(5)]
all inputs are string, and table fields are "VARCHAR".
.
I tried with "database lookup option too. But it does not have an option to match substring within a string.
Please help.
The JDBC driver is not replacing the parameter within the string. You must make the wildcard string first and pass the whole thing as a parameter. Here is a quick transform I threw together that does just that:
Note that in the Database Join step the SQL does not have '' quotes around it. Note also that unless used properly, the Database Join step can be a performance killer. This however, looks to be a reasonable use of it if there are going to be a lot of different wildcard values to use (unlike in my transform).