Using Jinja template variables with BigQueryOperator in Airflow

Using Jinja template variables with BigQueryOperator in Airflow - google-bigquery

I'm attempting to use the BigQueryOperator in Airflow by using a variable to populate the sql= attribute. The problem I'm running into is that the file extension is dropped when using Jinja variables. I've setup my code as follows:
dag = DAG(
dag_id='data_ingest_dag',
template_searchpath=['/home/airflow/gcs/dags/sql/'],
default_args=DEFAULT_DAG_ARGS
)
bigquery_transform = BigQueryOperator(
task_id='bq-transform',
write_disposition='WRITE_TRUNCATE',
sql="{{dag_run.conf['sql_script']}}",
destination_dataset_table='{{dag_run.conf["destination_dataset_table"]}}',
dag=dag
)
The passed variable contains the name of the SQL file stored in the separate SQL directory. If I pass the value as a static string, sql="example_file.sql", everything works fine. However, when I pass the example_file.sql using Jinja template variable it automatically drops the file extension and I receive this error:
BigQuery job failed.
Final error was: {u'reason': u'invalidQuery', u'message': u'Syntax error: Unexpected identifier "example_file" at [1:1]', u'location': u'query'}
Additionally, I've tried hardcoding ".sql" to the end of the variable anticipating that the extension would be dropped. However, this causes the entire variable reference to be interpreted as as string.
How do you use variables to populate BigQueryOperator attributes?

Reading the BigQuery operator docstring it seems that you can provide the sql statement in 2 ways:
1. As a string that can contain templating macros
2. A reference to a file that can contain templating macros (the file, not the file name).
You cannot template the file name but only the SQL statement. In fact, your error message shows that BigQuery did not recognize the identifier "example_file". If you inspect the BigQuery history for the project which ran that query, you will see that the query string was "example_file.sql" which is not a valid SQL statement, thus the error.

Related

In Matillion query,I am able to create a query profile correctly but have errors with the parameters. "The value of error could not be accessed"

I am trying to extract data from Gainsight to Snowflake using Matillion using API. I was able to create the query profile where the data is pulled correctly, but with errors in the parameters. The error is "Exception Testing table - The error was *** . The value of attribute could not be accessed. The attribute does not exist.
I tried using the escape [ with \ given below but did not work - https://metlcommunity.matillion.com/s/question/0D54G00007uPCSSSA4/i-new-to-api-and-i-am-getting-below-error-while-running-the-api-query-componentparameter-validation-failure-the-value-of-the-attribute-could-not-be-accessed-the-attribute-does-not-existi-can-successfully-create-the-api-query-profile
I was expecting data to show under the "Data Preview".

dbt - no output on variable flags.WHICH

My issue resides on the fact that when I invoke via Jinja the variable {{ flags.WHICH}} it returns no output.
I am trying to use this variable to get what type of command the DBT is running at the moment, either a run, a test, generate, etc.
I am using the version dbt 0.18.1 with the adapter SPARK

flags.WHICH was not introduced until dbt 1.0. You'll have to upgrade to get that feature. Here is the source for the flags module, if you're interested about the flags available in your version.
Note that in jinja, referencing an undefined variable simply templates to the empty string, and does not raise an exception.

In pentaho..How to pass a text file which contains all the definition of the connection parameters in the job?

I am using jdbc connection and i am passing parameters with example ${sample_db_connection} and that parameters has been defined in server in a text file as sample_db_connection=localhost and i want to pass the text file in the job step so that whenever the job ran and it found this parameter ,automatically it will take the value defined in text file.

You need to create a KTR file using "Property Input" as the input step and "Modified Java Script" Step to define the key value mapping. Check the image below:
Define your filename in the input step. In the JS step, you can use "setVariable" function to define the key-value mapping.
Once this job is executed at the start, pentaho will set the variables for all the connection.
Hope i have understood the question correctly and this is what you are looking for !! :)

using SSIS to copy a file

When trying to run a task to copy a file, I'm getting this message:
TITLE: Package Validation Error
------------------------------
Package Validation Error
------------------------------
ADDITIONAL INFORMATION:
Error at File System Task: Failed to lock variable "C:\Users\agordon\amtu2\DocumentTransport\production\reports\ORDER18940610353.txt" for read access with error 0xC0010001 "The variable cannot be found. This occurs when an attempt is made to retrieve a variable from the Variables collection on a container during execution of the package, and the variable is not there. The variable name may have changed or the variable is not being created.".
Error at File System Task [File System Task]: An error occurred with the following error message: "Failed to lock variable "C:\Users\agordon\amtu2\DocumentTransport\production\reports\ORDER18940610353.txt" for read access with error 0xC0010001 "The variable cannot be found. This occurs when an attempt is made to retrieve a variable from the Variables collection on a container during execution of the package, and the variable is not there. The variable name may have changed or the variable is not being created.".
".
Error at File System Task: There were errors during task validation.
(Microsoft.DataTransformationServices.VsIntegration)
------------------------------
BUTTONS:
OK
------------------------------
My schema looks like this:
Here are the properties of the failing component:
Here is the expressions section of the failing componenet:
And finally, here are my parameters:
What am I doing wrong?
Should I completely eliminate this File System Task and replace it with a C# Script Task to copy the file?
Is there something obviously wrong with my process?
I apologize for the size of the images, I think stackoverflow resizes them. The originals are here:
http://screencast.com/t/PvjBHWWHQ8
http://screencast.com/t/JWfs2n2uD8mu
http://screencast.com/t/T68ttqHo
http://screencast.com/t/89KCF8B0qBd

the properties page is asking for a variable name and you are providing a file path. Do you have a variable in your SSIS package that can hold the fully qualified file name?

Your SourceVariable property (in screenshot #2) should refer to your a variable name, not the actual value of the variable.

bq CLI says my JSON schema is invalid while the browser GUI says it's fine. Where am I going wrong?

I have a JSON schema:
[{"name":"timestamp","type":"integer"},{"name":"xml_id","type":"string"},{"name":"prod","type":"string"},{"name":"version","type":"string"},{"name":"distcode","type":"string"},{"name":"origcode","type":"string"},{"name":"overcode","type":"string"},{"name":"prevcode","type":"string"},{"name":"ie","type":"string"},{"name":"os","type":"string"},{"name":"payload","type":"string"},{"name":"language","type":"string"},{"name":"userid","type":"string"},{"name":"sysid","type":"string"},{"name":"loc","type":"string"},{"name":"impetus","type":"string"},{"name":"numprompts","type":"record","mode":"repeated","fields":[{"name":"type","type":"string"},{"name":"count","type":"integer"}]},{"name":"rcode","type":"record","mode":"repeated","fields":[{"name":"offer","type":"string"},{"name":"code","type":"integer"}]},{"name":"bw","type":"string"},{"name":"pkg_id","type":"string"},{"name":"cpath","type":"string"},{"name":"rsrc","type":"string"},{"name":"pcode","type":"string"},{"name":"opage","type":"string"},{"name":"action","type":"string"},{"name":"value","type":"string"},{"name":"other","type":"record","mode":"repeated","fields":[{"name":"param","type":"string"},{"name":"value","type":"string"}]}]
(http://jsoneditoronline.org/ for pretty print)
When loading through the browser GUI the schema is accepted as valid. The cli throws the following error:
BigQuery error in load operation: Invalid schema entry: "fields":[{"name":"type"
Is there something wrong with my schema as specified?

If you are passing the schema as json, you should write it to a file and pass the file name as the schema parameter. Passing the schema inline on the command line is only allowed for simple flat schemas.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using Jinja template variables with BigQueryOperator in Airflow - google-bigquery

Related

In Matillion query,I am able to create a query profile correctly but have errors with the parameters. "The value of error could not be accessed"

dbt - no output on variable flags.WHICH

In pentaho..How to pass a text file which contains all the definition of the connection parameters in the job?

using SSIS to copy a file

bq CLI says my JSON schema is invalid while the browser GUI says it's fine. Where am I going wrong?

Categories

Resources