The transfer of the variable to the query Pentaho

The transfer of the variable to the query Pentaho - pentaho

I have job where I want to transfer max_id with query to other query.
In table input I have query
select max(uid) from table_1
in Set Variables I have
And I want to transfer variables to other table
In Table Input I have query:
select * from table2 where uid BETWEEN 407043 and ${MAX_UID}
But it doesn't work

You don’t need the Get Variables step. Just use that query and don’t forget to check the “replace variables in script” box.

Related

How to make an alias to a query in Hive?

I have a long query and I'm looking for a way to simplify it for the executer.
For example, I have this query:
select function_1(r_set) from (select collect_set(records) as r_set from (select function_2(<column_name>) as records from <table_name>) as record_t) as record_set_t;
function_1 and function_2 are custom UDFs.
Since everything besides the table name and column name are constant, Is it possible to define some kind of alias or procedure to a query, with column name and table name as a parameter?
or even wrap it somehow with shorter execution command?
I look for something like:
# set alias for long query somehow
set MyQueryAlias = select function_1(r_set) from (select collect_set(records) as r_set from (select function_2(<column_name>) as records from <table_name>) as record_t) as record_set_t;
# execute the query with table name and column name as a parameter
exec MyQueryAlias <table_name> <column_name>
My purpose is to make it easy for other users to use the saved query on different tables and columns.

How to un-select some specific variables in a SAS sql process?

In SAS SQL process I always use the process like:
select v1, v2 ,v3....v10
from table_1;
However, if I want to select 99 variables out of 100 variables, it's not possible to write all of the variables in the "select" process.
I want to know if there are any better solutions to un-select one or some specific variables for the selection process?
I have tried to select all of the variables, then use the alter table process to drop the not wanted variables.
select*
from table_1;
alter table table_1
drop var50, var51;
It will be very helpful if anyone could me some suggestions. Thank you~

Sure that is possible in sas sql;
select * from table_1(drop=var50 var51);
but be warned variables are the dropped before read. So there is no access possible to them in statement so for example
select * from table_1(drop=var50 var51) where var50=1;
will not work. But when you use it for creating an new dataset you could rewrite it as:
create table_2(drop=var50 var51) as
select * from table_1 where var50=1;

Select name from system table and select from this table

I need dynamically obtain table name from system table and perform a select query on this table example:
SELECT "schema"+'.'+"table" FROM SVV_TABLE_INFO WHERE "table" LIKE '%blabla%'
it returns my_schema.the_main_blabla_table
And after I get this table name I need to perform :
SELECT * FROM my_schema.the_main_blabla_table LIMIT 100
Is it possible to in a single query?

If you are talking about select subquery after "from" i can say that you can do this.
You will get something like this:
SELECT * FROM
(
SELECT "schema"+'.'+"table" FROM SVV_TABLE_INFO WHERE "table" LIKE '%blabla%'
)
LIMIT 100
Unfortunately, i can't test it on yor data, but i very interested in result because i have never done something like this. If i get your question incorrect, tell me pls.

Amazon Redshift does not support the ability to take the output of a query and use it as part of another query.
Your application will need to query Redshift to obtain the relevant table name(s), then make another call to Redshift to query that table.

Pentaho Kettle Spoon Date manipulation

I am using Pentaho Spoon to do some transformation. I am using 'Table Input' and joining multiple tables to get final output table.
I need to achieve:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date = '2013-1-9'
AND TBLB.date BETWEEN '2012-11-15' AND '2013-1-9';
I am manually inserting '2012-11-15' but I am using Get System Data to insert '2012-1-9'. I am using 1 Get System Data.
My query is:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date='?'
AND TBLB.date BETWEEN '2012-11-15' AND '?';
I get error message in Table Input saying No value specified for parameter 2
Any suggestion will be appreciated.
Thank you.

Simple one this; You need to "duplicate" the system date. So add another line in "get system data" called "date2" or something, make it the same as the first line, and then it will fill in the 2nd parameter or ?
OR simply change the query to say between '2012-11-15' and TBLA.date
then you dont need the 2nd parameter

Personally I prefer the pattern of a Get System Info/Add Constants step to create one row with multiple columns that feeds into a Database Join step. Then you replace parameters in your query with columns instead of rows, and you can specify a column more than once.

Loop in Kettle/Spoon/Pentaho

I have a query like this:
SELECT count(distinct ID) FROM TBLC WHERE date BETWEEN ? AND ?;
I am using Pentaho Spoon. I am using 'Execute SQL Script'/ statement. The options I see are Execute for each row, execute as a single statement and variable substitution.
If I need to change my query or need other steps to implement, please response.
EDIT:
I am using a Pentaho Spoon to transfer data from Infobright database (table1, table2) to Infobright database (table3).
Query is similar to:
SELECT table1.column1, table2.column2
FROM table1 JOIN table2 ON table1.id=table2.id
WHERE table2.date BETWEEN '2012-12-01' AND '2012-12-30'
I want a way so that I do not have to manually specify the date range each time I run the transformation. I want to automate the date range.
Thanks in advance.

Based on what you've described, I believe you can accomplish what you want by using a generate rows step to inject rows into the stream containing the dates you want, then generate the needed queries for each date row in the stream to get all the rows you want from the source tables.

You can use execute as a single statement and variable substitution as they are best suited for your use case.
Add parameters StartDate and EndDate to your transformation and use them in your query as show below. Enable "Variable Subsitution" in the Execute SQL Script step.
SELECT table1.column1, table2.column2
FROM table1 JOIN table2 ON table1.id=table2.id
WHERE table2.date BETWEEN **$StartDate** AND **$EndDate**
Suppy values of StartDate and EndDate while executing the transformation.

i guess the dates are in a table or a file in the database
what you can do is :
create a job that get those parameters to the steam and set variables .
on the next job you can use them as variable to your query using {date_from} {date to}
that way each time you run the jobs it takes what inside the database
you of course need to update the date_from and date_to

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

The transfer of the variable to the query Pentaho - pentaho

You don’t need the Get Variables step. Just use that query and don’t forget to check the “replace variables in script” box.

Related

How to make an alias to a query in Hive?

How to un-select some specific variables in a SAS sql process?

Select name from system table and select from this table

Pentaho Kettle Spoon Date manipulation

Loop in Kettle/Spoon/Pentaho

Categories

Resources