How to run a select sql statement within a field in the Pentaho? - pentaho

I have a table with a 'query' field containing a select sql and another 'parameters' field containing the sql parameters. I have merged these two fields into a new field containing a correct select sql statement. Now I need to execute this new field containing select sql, get the return from select (the output fields) and generate an excel file.

Use Table-Input if you are interested in a query result set. Table-Input supports SQL parameters, so no need to build the statement yourself using e.g. Replace-In-String, and tripping over escapes on your way. Also, there's variable substitution, just in case you can't live with a single template.
Update 21:14 GMT
I'm not very fond of the way you try to prepare the SELECT statement, but here we go, assuming it's a single statement we have:
Create a job with a Start entry and 2 Transformation entries (T1, T2). Let T1 produce the field containing your SELECT statement and use a Set-Variables step to make the statement available to T2 as variable SELECT. In T2 use a Table-Input step referencing ${SELECT} in the SQL statement text area. Don't forget to enable option "Replace variables in script".
From now on it's a matter of taste. I would prefer to create a CSV file using Text-File-Output. Using the right field separator Excel will open the file after double-clicking it. The advantage of Text-File-Output is that you don't have to specify the fields you don't know at design-time anyway. An empty field list will just handle all fields coming in. Comparable to the total projection in a Table-Input which will create the necessary fields from the retrieved columns downstream.
If you must produce an Excel workbook, you'll have to learn about metadata injection. That would be a separate project for a beginner, though. There are samples in your Kettle installation folder. And there is a very active community if you find yourself in trouble.

Related

Use SQL Field in SSIS Variable

Is it possible to reference a SQL field in your SSIS variable?
For instance, I would like use the field from the "table" below
Select '999999' AS Physician_Profile_ID
as a dynamic variable (named "CMSPhysProID" in our example) here
I plan on concatenating multiple IDs into a In statement.
Possible by using execute sql taskIn left side pan of Execute SQL task, general tab 1.Select result set as single row2. Connection type ole db 3. Set connection and form SQL statement, As you mentioned Select '999999' AS Physician_Profile_ID 4.Go to result set in your left side pan 5. Add your variable where you want to store '999999' 6. Click ok
If you are looking to store the value within the variable to be used later, you can simply use an Execute SQL Task with a single row result set. More details in the following article:
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
If you are looking to add a computed column while importing data, you must use a Derived Column Transformation within the data flow task to add a column based on another one, you can refer to the following article for more details about this component:
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
What are you trying to accomplish by concatenating the IDs into an "IN" statement? If the idea is to use the values of the IDs to limit the results, as a dynamic WHERE clause, you may have better luck just using a lookup against either a table you maintain with the desired IDs or even a static list generated in the package with a script task. (If you can use the lookup table method it will be much easier to maintain as you only have to update a table, not your source code.)
Alternatively, you may even be able to accomplish the goal with a join. Create a temp table from the profile IDs you want to keep and join to it, or, again, use it as a lookup component. Dynamically creating a where clause using IN will come in a lot slower and will be cumbersome to maintain.

How do I deal with identically named fields in the source database, differentiated only by label name?

The database setup at my organisation is SQL tables copied onto our SAS server. The SQL tables were setup to run pre-programmed SQL queries, now SAS is the tool used. This however creates an issue with some tables having variables that are too long for SAS, but work in SQL. The label for the source variable is correct and not shortened.
The source table (in SQL Server) names:
Consolidated_Arrears_Vs_Portfolio_Balance_Ltd
Consolidated_Arrears_Vs_Portfolio_Balance_Pure
In SAS:
Consolidated_Arrears_Vs_Portfoli
Consolidated_Arrears_Vs_Portfoli
SAS Labels:
Consolidated_Arrears_Vs_Portfolio_Balance_Ltd
Consolidated_Arrears_Vs_Portfolio_Balance_Pure
So, how do I tell the difference in code between these two?
Thanks in advance.
To use the data as native in SAS, one approach would be to write a macro to map the original SQL names (per label) to the corresponding new SAS names. If the original table names got mangled as well you have a lot more issues.
Original SQL
select Abracadabra_Magical_Unity_Formation_SequenceId from AMUF_Master
Replace with
select %nameFor(Abracadabra_Magical_Unity_Formation_SequenceId) from AMUF_Master
The macro %nameFor would either do a dynamic lookup against the tables in the library, or perhaps better, when a static table design, create a fixed mapping table from a one time lookup
* presume SQL data now in libref MIGRATED;
* do once to get the variable metadata that includes LABEL and NAME;
proc sql;
create table static.nameFor as
select * from sashelp.vcolumn
where libnames = 'MIGRATED';
* use as needed;
%macro nameFor(SQL_Name);
%sysfunc(dosubl(select NAME from static.nameFor where LABEL="&SQL_Name"))
%mend;
You could also use the static.nameFor to discover all the SQL names that got changed during migration. Those would be where name ne label.
An automated approach would be to create a search and replace program that makes changes to a copy of the original SQL queries on-hand.
The search and replace would be either
find <long-named column>, replace with %nameFor(<long-named column>) , or
find <long-named column>, replace with <migrated to SAS column name>
The first replacement way adds noise.
The second way loses some of the original queries 'true-flavor'

How to run a sql within a field in the Pentaho?

I have a table with a 'query' field containing a select sql and another 'parameters' field containing the sql parameters. I have already merged these two fields into a new field containing a correct select sql statement. Now I need to execute this new field containing select sql, get the return from select (the output fields) and generate an excel file.
In your transformation, you can use the Execute row SQL script (under scripting), which does exactly what you want, executing the SQL contained in the selected input field for each row passed to the step.
Please do consider if that's really what you want. It works fine for a small set or complicated logic, but is very inefficient compared to batch inserts and updates.
If your SQL statements do look like "INSERT [some data]" and you have many rows to process, consider splitting the streams with filter steps and applying calculations or constants to each case to set the values correctly, then directing all to a table output step.

How to remove Nulls from Save As in SQL Server Management Studio

I have created a variable that is type table inside a stored procedure. At the end of the procedure I am selecting all the rows in the table and displaying them. When I right click on the headers and select "Save As" it allows me to change the type to All Files and save the file as a text file. This works fine except that the columns that have NULLS in them saves as NULLS. I want it to fill NULLs in with spaces.
I've been trying to find a way to create a file using a stored procedure but most things indicate to use SSIS but I can't figure out how to use SSIS with a variable that is a table instead of using an actual table.
If I could either replace nulls with spaces or use a stored procedure to do the same thing it would be great. I can not use tab or comma delimited as the final product has to be a flat file that each column uses the same amount of characters as is declared in the column headers. Padded with spaces.
Thanks for any help you are able to offer.
Cheers
P.S. I am using SQL Server 2012 Management Studio
The easy way to do this would be to convert the NULLs to spaces in your SELECT statement.
SELECT COALESCE(yourcolumn, '')
Put the COALESCE clause around every column that has NULLs in it.
Using COALESCE article link
If the last thing you do in the stored procedure is Select * From TempTable then you can use that SP in an OleDb source component. Change from Table or View to Sql Command and use the Exec (sp_SomeName) syntax. This will create a pipe that you can connect to a destination component, such as flat file.
I have seen many issues over the years doing Save Results As... I will only use this for informal 'quick check' files and not for anything considered 'live' or 'production' data.
Here is a good blog that also shows how to use parameters.
http://geekswithblogs.net/stun/archive/2009/03/05/mapping-stored-procedure-parameters-in-ssis-ole-db-source-editor.aspx

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.