Loop in Kettle/Spoon/Pentaho

Loop in Kettle/Spoon/Pentaho - pentaho

I have a query like this:
SELECT count(distinct ID) FROM TBLC WHERE date BETWEEN ? AND ?;
I am using Pentaho Spoon. I am using 'Execute SQL Script'/ statement. The options I see are Execute for each row, execute as a single statement and variable substitution.
If I need to change my query or need other steps to implement, please response.
EDIT:
I am using a Pentaho Spoon to transfer data from Infobright database (table1, table2) to Infobright database (table3).
Query is similar to:
SELECT table1.column1, table2.column2
FROM table1 JOIN table2 ON table1.id=table2.id
WHERE table2.date BETWEEN '2012-12-01' AND '2012-12-30'
I want a way so that I do not have to manually specify the date range each time I run the transformation. I want to automate the date range.
Thanks in advance.

Based on what you've described, I believe you can accomplish what you want by using a generate rows step to inject rows into the stream containing the dates you want, then generate the needed queries for each date row in the stream to get all the rows you want from the source tables.

You can use execute as a single statement and variable substitution as they are best suited for your use case.
Add parameters StartDate and EndDate to your transformation and use them in your query as show below. Enable "Variable Subsitution" in the Execute SQL Script step.
SELECT table1.column1, table2.column2
FROM table1 JOIN table2 ON table1.id=table2.id
WHERE table2.date BETWEEN **$StartDate** AND **$EndDate**
Suppy values of StartDate and EndDate while executing the transformation.

i guess the dates are in a table or a file in the database
what you can do is :
create a job that get those parameters to the steam and set variables .
on the next job you can use them as variable to your query using {date_from} {date to}
that way each time you run the jobs it takes what inside the database
you of course need to update the date_from and date_to

Related

Query just runs, doesn't execute

my query just runs and doesnt execute, what is wrong. work on oracle sql developer, company server
CREATE TABLE voice2020 AS
SELECT
to_char(SDATE , 'YYYYMM') as month,
MSISDN,
SUM(CH_MONEY_SUBS_DED)/100 AS AIRTIME_VOICE,
SUM(CALLDURATION/60) AS MIN_USAGE,
sum(DUR_ONNET_OOB/60) as DUR_ONNET_OOB,
sum(DUR_ONNET_IB/60) as DUR_ONNET_IB,
sum(DUR_ONNET_FREE/60) as DUR_ONNET_FREE,
sum(DUR_OFFNET_OOB/60) as DUR_OFFNET_OOB,
sum(DUR_OFFNET_IB/60) as DUR_OFFNET_IB,
sum(DUR_OFFNET_FREE/60) as DUR_OFFNET_FREE,
SUM(case when sdate < to_date('20190301','YYYYMMDD')
then CH_MONEY_PAID_DED-nvl(CH_MONEY_SUBS_DED,0)-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming
else (CH_MONEY_OOB-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming) end)/100 AS VOICE_OOB_SPEND
FROM CCN.CCN_VOICE_MSISDN_MM#xdr1
where MSISDN IN ( SELECT MSISDN FROM saayma_a.BASE30112020) --change date
GROUP BY
MSISDN,
to_char(SDATE , 'YYYYMM')
;

This is a performance issue. Clearly the query driving your CREATE TABLE statement is taking too long to return a result set.
You are querying from a table in a remote database (CCN.CCN_VOICE_MSISDN_MM#xdr1) and then filtering against a local table (saayma_a.BASE30112020) . This means you are going to copy all of that remote table across the network, then discard the records which don't match the WHERE clause.
You know your data (or at least you should know it): does that sound efficient? If you're actually discarding most of the records you should try to filter CCN_VOICE_MSIDN_MM in the remote database.
If you need more advice you need to provide more information. Please read this post about asking Oracle tuning questions on this site, then edit your question to include some details.

You are executing CTAS (CREATE TABLE AS SELECT) and the purpose of this query is to create the table with data which is generated via this query.
If you want to just execute the query and see the data then remove first line of your query.
-- CREATE TABLE voice2020 AS
SELECT
.....
Also, the data of your actual query must be present in the voice2020 table if you have already executed it once.
Select * from voice2020;

Looks like you are trying to copying the data from one table to another table, Can you once create the table if it's not created and then try this statement.
insert into target_table select * from source_table;

The transfer of the variable to the query Pentaho

I have job where I want to transfer max_id with query to other query.
In table input I have query
select max(uid) from table_1
in Set Variables I have
And I want to transfer variables to other table
In Table Input I have query:
select * from table2 where uid BETWEEN 407043 and ${MAX_UID}
But it doesn't work

You don’t need the Get Variables step. Just use that query and don’t forget to check the “replace variables in script” box.

Delete Query inside Where clause

Is there any possibility to write delete query inside Where clause.
Example:
Select ID,Name From MyTable Where ID IN(Delete From MyTable)
It may be crazy, but let me explain my situation. In our reporting tool, we are supporting to enter SQL where query.
We will use our own Select and From Clause query and combine the user's where query input.
Example:
Select ID,Name From MyTable Where ("Query typed by user")
Here, user can type any kind of where query filter..
If he types like ID=100 our final query becomes like this
Select ID,Name From MyTable Where (ID=100)
One of our customer asked us what will happen if anyone type the delete query as where query filter. he feels this may be the security hole..so we have tried that kind of possibility in our dev environment. But the sql returns error for the following query.
Select ID,Name From MyTable Where ID IN(Delete From MyTable)
So finally, my question is, is there any other possibility to write Delete Query inside Where clause or Select clause.. If it possible, how can I restrict it?

Yes. They can run a delete. They can type:
1 = 1; DELETE FROM MY_TABLE;
Or even worse in some ways, (since you should have backups):
1 = 0 UNION SELECT SOCIAL_SECURITY_NUMBER, CREDIT_CARD_NUMBER, OTHER_SENSITIVE_DATA FROM MY_SENSITIVE_TABLE;
Now, in your case its hard to validate. Normally if you are just passing a value to filter on you can use parameterised sql to save yourself. You however also need to let the user select a column. In cases like these, usually we use a drop down to allow the user to select a predefined list of columns and then validate the column name server side. We give the user a text box to enter the value to match and then parameterise that.

It's not quite possible. But he can do something like this :
Select ID,Name From MyTable Where (ID=100); (DELETE FROM MyTable Where 1 = 1)
by using ID=100); (DELETE FROM MyTable Where 1 = 1 instead of ID=100

I believe what your customer is talking about is SQL injection, as long as you have taken appropriate methods to block other queries from running after your select statement is done, then you should have no problem in letting them type whatever it is that you want.
From my experience there is no way to delete anything when you are doing a select statement.
Just make sure you have query terminator characters so they don't write something like the following.
select column1,column2, from myTable where ID in (1,2); delete from my table
this would be a valid worry from your customer if you aren't taking proper steps to prevent sql injection from happening.
You could have your SQL reporting tool just not have update, or delete permission and just have it have Read permission. However, it is up to you guys have you handle your sql injection security.

Pentaho Kettle Spoon Date manipulation

I am using Pentaho Spoon to do some transformation. I am using 'Table Input' and joining multiple tables to get final output table.
I need to achieve:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date = '2013-1-9'
AND TBLB.date BETWEEN '2012-11-15' AND '2013-1-9';
I am manually inserting '2012-11-15' but I am using Get System Data to insert '2012-1-9'. I am using 1 Get System Data.
My query is:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date='?'
AND TBLB.date BETWEEN '2012-11-15' AND '?';
I get error message in Table Input saying No value specified for parameter 2
Any suggestion will be appreciated.
Thank you.

Simple one this; You need to "duplicate" the system date. So add another line in "get system data" called "date2" or something, make it the same as the first line, and then it will fill in the 2nd parameter or ?
OR simply change the query to say between '2012-11-15' and TBLA.date
then you dont need the 2nd parameter

Personally I prefer the pattern of a Get System Info/Add Constants step to create one row with multiple columns that feeds into a Database Join step. Then you replace parameters in your query with columns instead of rows, and you can specify a column more than once.

SQL results returned in tablename.fieldname format

When I run a query that is selecting fields from multiple tables, i.e. for a join, I would do something like:
SELECT table1.field1, table2.field2 FROM table1 JOIN table2 ON table1.field1 = table2.field1;
When the results of a query like this is returned the array only has the field names as the index, not the combination of tablename.fieldname, and I am wondering if it is possible to get it to return the data in this format because it is required by a plugin that I use.
Thank you.
EDIT: I thought of doing an alias, but the way that the plugin works is it takes the columns I give it literally, and then looks for those EXACT names in the array. Basically I give it the columns, it dynamically gets the data and displays it. So if I give it 'table1.field1 AS alias', it will look for that but the resulting field will be just 'alias'.
EDIT 2: I found a solution to my problem, it was not a SQL solution. I manipulated the resulting array of the query with PHP to get it back into the format that I needed to be in.

One option is to use alias for columns.

There is no way to do it automatically. You can just alias them, though -
SELECT table1.field1 as [table1.field1], table2.field2 as [table2.field2] FROM table1 JOIN table2 ON table1.field1 = table2.field1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Loop in Kettle/Spoon/Pentaho - pentaho

Based on what you've described, I believe you can accomplish what you want by using a generate rows step to inject rows into the stream containing the dates you want, then generate the needed queries for each date row in the stream to get all the rows you want from the source tables.

Related

Query just runs, doesn't execute

The transfer of the variable to the query Pentaho

Delete Query inside Where clause

Pentaho Kettle Spoon Date manipulation

SQL results returned in tablename.fieldname format

Categories

Resources