I am using the new scripting feature of Big Query to declare a variable and then am using that variable in a standard SQL query.
The structure of the query is :
DECLARE {name of variable} {data type};
SET {name of variable} = {Value}'
(A SQL QUERY THEN FOLLOWS USING THE ABOVE VARIABLE)
I understand that this is now a script a no longer a typical query, and thus when I run it, it runs as a sequence of executable tasks. But is there anyway in the script to explicitly state that I only want to output the resulting table of the SQL query as opposed to both the result of declaring the variable and SQL query?
What BQ Outputs
Depending on how you "capture" the output, if you are sending a query from Python/Java/CLI, then the last SELECT statement in script is the only output that you receive with the API.
Please also note that each "output" that you see come with a cost/bytes-billed, which is another reason for them to be visible at all time.
Update:
If you need to capture the output of SELECT statement to a table, depending on your intention, you may use:
CREATE OR REPLACE TABLE <your_destination_table> AS SELECT ...
or
INSERT INTO TABLE <your_destination_table> SELECT ...
Related
I have a SQL Script with multiple drop & create DDL(Create tables As Select *), I want to run them at one go. I am quite new to informatica powercenter, can some one provide the process of using SQL transformation for BigQuery in informatica.
Sample Query:-
drop table if exists sellout.account_table;
CREATE TABLE sellout.account_table
AS
SELECT * FROM
sellout.account_src
WHERE
UPPER(account_name) IN ('RANDOM');
Similar to the above queries i have around 24 SQL's in a script.
I want to run them at once and later make them as part of informatica job.
If the "PowerExchange Google BigQuery" server and client are installed and after executing the infasetup.bat(sh) validateandregisterallfeatures, the mappings would be opened/exported successfully.
Here are some FAQs that might be handy for you:
Q: Why are the output fields in SQL Transformation not seen?
A: Stored Procedure selected in the SQL Transformation must have output parameters declared. Else it would not have output fields other than default Return Code column.
Q: A set of columns are displayed as result while running the Stored Procedure, however, you still do not see the same columns as output in SQL Transformation. Why?
A: Columns seen in the output might not be defined/declared as output parameters in the Stored Procedure. Procedure might have 'SELECT * FROM' like statement, which retrieves the data when the procedure is run from DB UI and a similar result could be seen when the procedure is run programmatically.
However, to call the same procedure from SQL Transformation, explicitly declared output parameters should be present as the transformation imports the metadata of the proc when selected. Unless you declare the output parameters explicitly in the procedure, it cannot be seen as output in the transformation.
Q: Is it necessary to have input/output parameters in Stored Procedure to call it from SQL Transformation?
A: Yes, it is necessary to have input/output parameters in Stored Procedure if it is not having default ones. As these parameters appear as input/output fields in SQL transformation, without these Mapping becomes invalid.
Q: I have SELECT statement in the procedure, does the SQL transformation can push this to next transforamtion?
A: Approprioate output parameters are required for this to work.
I am using SELECT UpdateDate FROM dbo.log command in a execute sql task. I'm fairly new to this so please bear with me. I want to store the value as a variable then pass that into the where clause of a subsequent data flow. My questions are:
What is the correct way to setup the Execute SQL Task. In General I have the OLE DB Connection and direct input with the query above. Result Set is set to Single row and then I am storing this to a variable I have created called User:: UpdateDate. For some reason this doesn't work?
I then want to call this date in a data flow. ie. SELECT * FROM Users WHERE RecordDate > User::UpdateDate. I believe the syntax is different for this.
I would really appreciate some help with this. Many thanks in advance
In your Execute SQL Task Editor, configure the Parameter Mapping as shown below, obviously use your own variable, in this example I'm using PackageStartTime.
Then in your SQL statement, use below:
SELECT * FROM Users WHERE RecordDate > ?
To save value from a SQL Statement, you will need to set the Result Set to single row and configure result set as shown in the example below:
Execute SQL Task with ResultSet
First of all, create a variable of type System.Date example: #[User::UpdateDate].
Add an Execute SQL Task select the OLEDB connection and use the following command as SQL Statement:
SELECT TOP 1 UpdateDate FROM dbo.log
Set the ResultSet property to Single Row and in the ResultSet Tab add a Row with the following values:
ResultName = 0 (which means the first column)
VariableName = #[User::UpdateDate]
Additional Information
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
OLEDB Source with parameterized SQL Command
Inside the Data Flow Task, add an OLEDB Source, select the Access Mode to SQL Command. And write the following command:
SELECT * FROM Users WHERE RecordDate > ?
Click on the Parameters button and map the variable #[User::UpdateDate] as the first parameter.
Additional Information
Map Query Parameters to Variables in a Data Flow Component
Parameterized OLEDB source query
I am currently using a function in SQL Server to get the max-value of a certain column. I Need this value to generate a specific number of dummy files to insert flowfiles that are created later on.
Is there a way of calling this function via a nifi-processor?
By using ExecuteSQL I Always get error like unable to execute SQL select query or the column "ab" was not found, when using select ab.functionname() (ab is the loginname of the db)
In SQL Server I can just use select ab.functionname() and get the desired results.
If there is no possible way of calling this function, is there another way to create #flowfiles dummyfiles to reserve this place for them in the DB so that no one else could insert or use this ids (not autoincremt, because it is not possible) while the flowfiles are getting processed?
I tried using $flowfile.count and the Counterprocessor, but this did not solve the Problem.
It should look like: INSERT INTO table (id,nr) values (max(id)+1,anynumber) for every flowfiles, unfortunately the ExecuteSQL is not able to do this.
Think this conversation can help you:
https://community.hortonworks.com/questions/26170/does-executesql-processor-allow-to-execute-stored.html
Gist:
You can use ExecuteScript or ExecuteProcess to call appropriate script. For example for ExecuteProcess just call sqlplus command. Choose type of command "sqlplus". In command arguments set something like: user_id/password#dbname #"script_path/someScript.sql". In someScript.sql you put something like:
execute spname(param)
You can write your own processor :) Of course it's more difficulty and often unnecessary
I have a table with a 'query' field containing a select sql and another 'parameters' field containing the sql parameters. I have already merged these two fields into a new field containing a correct select sql statement. Now I need to execute this new field containing select sql, get the return from select (the output fields) and generate an excel file.
In your transformation, you can use the Execute row SQL script (under scripting), which does exactly what you want, executing the SQL contained in the selected input field for each row passed to the step.
Please do consider if that's really what you want. It works fine for a small set or complicated logic, but is very inefficient compared to batch inserts and updates.
If your SQL statements do look like "INSERT [some data]" and you have many rows to process, consider splitting the streams with filter steps and applying calculations or constants to each case to set the values correctly, then directing all to a table output step.
I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.