SQL Transformation in Informatica for Google Bigquery - sql

I have a SQL Script with multiple drop & create DDL(Create tables As Select *), I want to run them at one go. I am quite new to informatica powercenter, can some one provide the process of using SQL transformation for BigQuery in informatica.
Sample Query:-
drop table if exists sellout.account_table;
CREATE TABLE sellout.account_table
AS
SELECT * FROM
sellout.account_src
WHERE
UPPER(account_name) IN ('RANDOM');
Similar to the above queries i have around 24 SQL's in a script.
I want to run them at once and later make them as part of informatica job.

If the "PowerExchange Google BigQuery" server and client are installed and after executing the infasetup.bat(sh) validateandregisterallfeatures, the mappings would be opened/exported successfully.
Here are some FAQs that might be handy for you:
Q: Why are the output fields in SQL Transformation not seen?
​A: Stored Procedure selected in the SQL Transformation must have output parameters declared. Else it would not have output fields other than default Return Code column.
​​
Q: A set of columns are displayed as result while running the Stored Procedure, however, you still do not see the same columns as output in SQL Transformation. Why?
A: Columns seen in the output might not be defined/declared as output parameters in the Stored Procedure. Procedure might have 'SELECT * FROM' like statement, which retrieves the data when the procedure is run from DB UI and a similar result could be seen when the procedure is run programmatically.
However, to call the same procedure from SQL Transformation, explicitly declared output parameters should be present as the transformation imports the metadata of the proc when selected. Unless you declare the output parameters explicitly in the procedure, it cannot be seen as output in the transformation.
​Q: Is it necessary to have input/output parameters in Stored Procedure to call it from SQL Transformation?
A: Yes, it is necessary to have input/output parameters in Stored Procedure if it is not having default ones. As these parameters appear as input/output fields in SQL transformation, without these Mapping becomes invalid.
Q: I have SELECT statement in the procedure, does the SQL transformation can push this to next transforamtion?
A: Approprioate output parameters are required for this to work.

Related

Get a Big Query script to output a table

I am using the new scripting feature of Big Query to declare a variable and then am using that variable in a standard SQL query.
The structure of the query is :
DECLARE {name of variable} {data type};
SET {name of variable} = {Value}'
(A SQL QUERY THEN FOLLOWS USING THE ABOVE VARIABLE)
I understand that this is now a script a no longer a typical query, and thus when I run it, it runs as a sequence of executable tasks. But is there anyway in the script to explicitly state that I only want to output the resulting table of the SQL query as opposed to both the result of declaring the variable and SQL query?
What BQ Outputs
Depending on how you "capture" the output, if you are sending a query from Python/Java/CLI, then the last SELECT statement in script is the only output that you receive with the API.
Please also note that each "output" that you see come with a cost/bytes-billed, which is another reason for them to be visible at all time.
Update:
If you need to capture the output of SELECT statement to a table, depending on your intention, you may use:
CREATE OR REPLACE TABLE <your_destination_table> AS SELECT ...
or
INSERT INTO TABLE <your_destination_table> SELECT ...

Dynamic parameter value in OLEDB Source

I had asked a question earlier this day and this is another question from me.
Well, there is my Execute SQL Task which assigns a result (single value of type int) to a parameter. After this I have a DFT inside which there is an OLEDB Source. I need to execute a stored procedure in the oledb source which should get the parameter value from earlier exec sql task resultset variable. This will give me a resultset and I need to load the same into another table.
My question is, I am not able to view the column list because of the dynamic sql and hence unable to map the destination columns. How best should I proceed in this case? Is this a good approach?
Since the stored procedure was using a dynamic parameter, I used the "WITH RESULT SETS" option in the OLEDB Source and it seems to work fine

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.

Using dynamic SQL in an OLE DB source in SSIS 2012

I have a stored proc as the SQL command text, which is getting passed a parameter that contains a table name. The proc then returns data from that table. I cannot call the table directly as the OLE DB source because some business logic needs to happen to the result set in the proc. In SQL 2008 this worked fine. In an upgraded 2012 package I get "The metadata could not be determined because ... contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set."
The problem is I cannot define the field names in the proc because the table name that gets passed as a parameter can be a different value and the resulting fields can be different every time. Anybody encounter this problem or have any ideas? I've tried all sorts of things with dynamic SQL using "dm_exec_describe_first_result_set", temp tables and CTEs that contains WITH RESULT SETS, but it doesn't work in SSIS 2012, same error. Context is a problem with a lot of the dynamic SQL approaches.
This is latest thing I tried, with no luck:
DECLARE #sql VARCHAR(MAX)
SET #sql = 'SELECT * FROM ' + #dataTableName
DECLARE #listStr VARCHAR(MAX)
SELECT #listStr = COALESCE(#listStr +',','') + [name] + ' ' + system_type_name FROM sys.dm_exec_describe_first_result_set(#sql, NULL, 1)
exec('exec(''SELECT * FROM myDataTable'') WITH RESULT SETS ((' + #listStr + '))')
So I ask out of kindness, by why on God's green earth are you using an SSIS Data Flow task to handle dynamic source data like this?
The reason you're running into trouble is because you're perverting every purpose of an SSIS Data flow task:
to extract a known source with known metadata that can be statically typed and cached in design-time
to run through a known process with straightforward (and ideally asynchronous) transformations
to take that transformed data and load it into a known destination also with known metadata
It's fine to have parameterized data sources that bring back different data. But to have them bring back entirely different metadata each time with no congruity between the different sets is, frankly, ridiculous, and I'm not entirely sure I want to know how you handled all your column metadata in the working 2008 package.
This is why it wants you add a WITH RESULTS SET to the SSIS query - so it can generate some metadata. It doesn't do this at runtime - it can't! It has to have a known set of columns (because it aliases them all into compiled variables anyway) to work with. It expects the same columns every time it runs that Data Flow Task - the exact same columns, down to the names, the types, and the constraints.
Which leads to one (terrible, terrible) solution - just stick all the data into a temporary table with Column1, Column2 ... ColumnN and then use the same variable you're using as the table name parameter to conditionally branch your code and do whatever you want with the columns.
Another more sane solution would be to create a data flow task for each of your source tables, and use your parameter in a precedence constraint to just pick which data flow task should run.
For a solution this poorly tailored for an out-of-the-box ETL, you should also highly consider just rolling your own in C# or a script task instead of the Data Flow Task provided by SSIS.
In short, please don't do this. Think of the children (packages)!
I've used CozyRoc Dynamic DataFlow Plus to achieve this.
Using configuration tables to build the SQL Select statements, I have a single SSIS package that loads data from Oracle and Sybase (or any OLEDB source) to MS SQL. Some of the result sets are in the millions of rows and performance is excellent.
Instead of writing a new package every time a new table is needed, this can be configured in minutes and run on a the pre-tested and robust existing package.
Without it I would have been up for writing hundreds of packages.

Mysql - modify result dataset

I have a program (flex) that queries a database and gets back a dataset (value, timestamp). In the program I then put each value in the set through an algorithm to get a new value resulting in all the values being transformed. Instead of having to do this transformation of data in my program I would like mysql to do it and send the results back. Essentially, I would like to do a SELECT statement that returns a modified dataset.
Assuming the computation can only be done in a stored procedure or function, yes - as long as you are using MySQL 5.0+. I recommend reading this article on MySQL stored procedures & functions.