I am trying to set a date variable within a dbt model to be the date 7 days ago. The model will run against a Redshift database. I have done the following to set the variable, however I get the error DATE_ADD is not defined:
{%- set start_date = TRUNC(DATE_ADD(day, -7, CURRENT_DATE)) -%}
What is the correct way to set the variable.
jinja is a templating language. When you run dbt, it first executes the jinja to "compile" your model, and then it executes your compiled code against your database.
jinja doesn't have functions called trunc or date_add or current_date, since those are SQL functions.
You have two choices:
Set the variable equal to a string and include that string in your model code, so that the database calculates this date. That would look like this (note the extra quotes):
{%- set start_date = "TRUNC(DATE_ADD(day, -7, CURRENT_DATE))" -%}
select {{ start_date }}
If you compile this and check the code generated in your target directory, you'll see it becomes this SQL:
select TRUNC(DATE_ADD(day, -7, CURRENT_DATE))
Use jinja's context to calculate the date and include the date literal in your SQL. dbt's jinja context includes a special variable called run_started_at, and also Python's datetime module. Putting those together looks like this:
{%- set start_datetime = run_started_at - modules.datetime.timedelta(days=7) -%}
{%- set start_date = start_datetime.strftime("%Y-%m-%d") -%}
select '{{ start_date }}'
This will compile to:
select '2023-01-12'
Related
I have a query I want to run using the BigQueryOperator. Each day, it will run for the past 21 days. The sql file stays the same, but the date passed to the file changes. So for example today it will run for today's date, then repeat for yesterday's date, and then repeat for 2 days ago, all the way up to 21 days ago. So it will run on 7/14/2021, and so I need to pass this date to my sql file. Then It will run for 7/13/2021, and the date I need to pass to my sql file is 7/13/2021. How can I have this dag repeat for a date range, and dynamically pass this date to the sql file.
In the BigQueryOperator, variables are passed in the "user_defined_macros, section, so I don't know how to change the date I am passing. I thought about looping over an array of dates, but I don't know how to pass that date to the sql file linked in the BigQueryOperator.
My sql file is 300 lines long, so I included a simple example below, as people seem to ask for one.
DAG
with DAG(
dag_id,
schedule_interval='0 12 * * *',
start_date=datetime(2021, 1, 1),
template_searchpath='/opt/airflow/dags',
catchup=False,
user_defined_macros={"varsToPass":Var1
}
) as dag:
query_one = BigQueryOperator(
task_id='query_one',
sql='/sql/something.sql',
use_legacy_sql=False,
destination_dataset_table ='table',
write_disposition = 'WRITE_TRUNCATE'
)
sql file
SELECT * FROM table WHERE date = {{CHANGING_DATE}}
Your code is confusing because you describe a repeated pattern of today,today-1 day, ..., today - 21 days however your code shows write_disposition = 'WRITE_TRUNCATE' which means that only the LAST query matters because each query erase the result of the previous one. Since no more information provided I assume you actually mean to run a single query between the today to today - 21 days.
Also You didn't mention if the date that you are referring to is Airflow execution_date or today date.
If it's execution_date you don't need to pass any parameters. the SQL needs to be:
SELECT * FROM table WHERE date BETWEEN {{ execution_date }} AND
{{ execution_date - macros.timedelta(days=21) }}
If it's today then you need to pass parameter with params:
from datetime import datetime
query_one = BigQueryOperator(
task_id='query_one',
sql='/sql/something.sql',
use_legacy_sql=False,
destination_dataset_table ='table',
write_disposition = 'WRITE_TRUNCATE',
params={
"end": datetime.utcnow().strftime('%Y-%m-%d'),
"start": (datetime.now() - datetime.timedelta(days=21)).strftime('%Y-%m-%d')
}
)
Then in the SQL you can use it as:
SELECT * FROM table WHERE date BETWEEN {{ params.start }} AND
{{ params.end }}
I'd like to point that if you are not using execution_date then I don't see the value of passing the date from Airflow. You can just do it directly with BigQuery by setting the query to:
SELECT *
FROM table
WHERE date BETWEEN DATE_SUB(current_date(), INTERVAL 21 DAY) AND current_date()
If my assumption was incorrect and you want to run 21 queries then you can do that with a loop as you described:
from datetime import datetime, timedelta
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
a = []
for i in range(0, 21):
a.append(
BigQueryOperator(
task_id=f'query_{i}',
sql='/sql/something.sql',
use_legacy_sql=False,
destination_dataset_table='table',
write_disposition='WRITE_TRUNCATE', # This is probably wrong, I just copied it from your code.
params={
"date_value": (datetime.now() - timedelta(days=i)).strftime('%Y-%m-%d')
}
)
)
if i not in [0]:
a[i - 1] >> a[i]
Then in your /sql/something.sql the query should be:
SELECT * FROM table WHERE date = {{ params.date_value }}
As mentioned this will create a workflow :
Note also that BigQueryOperator is deprecated. You should use BigQueryExecuteQueryOperator which available in Google provider via
from airflow.providers.google.cloud.operators.bigquery import BigQueryExecuteQueryOperator
for more information about how to install Google provider please see the 2nd part of the following answer.
I have a date value in one page item and i have a time value in another. i want to concat the date with the time and set it to another item using a dynamic action. Structure as below
P_DATE = '01-01-2021' (item is a date with format mask dd-mm-yyyy)
P_TIME = '08:30 AM' (item is date with format mask HH:MIAM)
Query:
select to_date(to_char(to_date(P_DATE ,'DD-MON-YYYY'), 'DD-MON-YYYY') || ' '||
to_char(P_TIME,'HH:MIAM'),'DD-MON-YYYY HH:MIAM') a
from dual;
Desired Outcome: 01-01-2021 08:30 AM
Dynamic action is on change the P_DATE item then from an sql query, concat the P_DATE and P_TIME and set it to P_VALUE
When i run the select in sql developer with hardcoded values then it returns the correct stuff but when i try to set the value in the item with the concat date it giving me invalid number error sometimes and not a valid month.
Can you suggest the corrected way or an alternative way of doing this (maybe use a function)
THank you.
You can make that much simpler, example
with test as (
select '01-01-2021' d, '08:30 PM' t
from dual)
select to_date(d||' '|| t,'MM-DD-YYYY HH:MI PM')
from test;
What do you mean by "they are date items" ? Are they of the type "Date Picker" , do they have a source column of type "DATE" in a form, or do they map to a DATE column in the database ?
In APEX, all page items are basically strings. The frontend of the web application doesn't know about oracle datatypes so everything is treated as a plain string and during the processing the conversion is done. So that is how you should treat the page items, not as DATE data type like you would in SQL or PL/SQL. To concatenate a date string and a time string, you can just a plain concatenate without TO_CHAR. This can be done in plain PL/SQL, no need to SELECT FROM DUAL - that is just an unnecessary call to the SQL engine.
This is the "true" action on the change of P_VALUE. Tested on 21.1, so depending on your version there might be some attribute naming differences but it works the same.
Action: execute server side code.
Source:
:P_VALUE := :P_DATE ||' '||:P_TIME
Items to submit: P_DATE, P_TIME.
Items to return: P_VALUE
Since you're dealing with strings there is room for error here, you'll have to ensure proper error handling if the user input does not exactly match the format since that could generate invalid date values.
From comments:
the page items are both date items
Since they are dates, you can simply use:
SELECT p_date + (p_time - TRUNC(p_time)) AS a
FROM DUAL;
I am trying to insert data from one BigQuery table to another using an Airflow DAG. I want to filter data such that the updateDate in my source table is greater than the previous execution date of my DAG run.
The updateDate in my source table looks like this: 2021-04-09T20:11:11Zand is of STRING data type whereasprev_execution_datelooks like this:2021-04-10T11:00:00+00:00which is why I am trying to convert myupdateDate` to TIMESTAMP first and then to ISO format as shown below.
SELECT *
FROM source_table
WHERE FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez", TIMESTAMP(UpdateDate)) > TIMESTAMP('{{ prev_execution_date }}')
But I am getting the error message: No matching signature for operator > for argument types: STRING, TIMESTAMP. Supported signature: ANY > ANY. Clearly the left hand side of my WHERE-clause above is of type STRING. How can I convert it to TIMESTAMP or to a correct format for that matter to be able to compare to prev_execution_date?
I have also tried with the following:
WHERE FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez", TIMESTAMP(UpdatedWhen)) > STRING('{{ prev_execution_date }}')
which results in the error message: Could not cast literal "2021-04-11T11:50:31.284349+00:00" to type DATE
I would appreciate some help regarding how to write my BigQuery SQL query to compare the String timestamp to previous execution date of Airflow DAG.
Probably you wanted to try parse_timestamp instead:
SELECT *
FROM source_table
WHERE PARSE_TIMESTAMP("%Y-%m-%dT%X%Ez", UpdateDate) > TIMESTAMP('{{ prev_execution_date }}')
although looks like it will work even without it
SELECT *
FROM source_table
WHERE TIMESTAMP(UpdateDate) > TIMESTAMP('{{ prev_execution_date }}')
I'm trying to make a community connector using the advanced services from Google's Data Studio to connect to my BigQuery data table. The connector is all set up and my getData function returns a query which looks like:
var sqlString = "SELECT * FROM `PROJECT.DATASET.TABLE` WHERE " +
"DATE(timestamp) >= #startDate AND DATE(timestamp) <= #endDate;"
where PROJECT, DATASET, and TABLE are filled in with their respective IDs. The 'timestamp' field is a BigQuery field in my data table of type TIMESTAMP.
In my getConfig function, I'm setting the configuration to add a daterange object to the request passed into getData:
function getConfig() {
...
config.setDateRangeRequired(true);
...
}
I'm then returning the community connector object (defined as 'cc' variable in code below) in my getData function, setting the sql string, query parameters for startDate and endDate, and some other necessary info:
function getData(request) {
...
return cc
.newBigQueryConfig()
.setAccessToken(accessToken) // defined earlier
.setBillingProjectId(billingProjectId) // defined earlier
.setUseStandardSql(true)
.setQuery(sqlString)
.addQueryParameter('startDate', bqTypes.STRING,
request.dateRange.startDate)
.addQueryParameter('endDate', bqTypes.STRING,
request.dateRange.endDate)
}
When I run this connector in a report, it connects to BigQuery and even queries the table, but it does not return any data. When I replace #startDate and #endDate with string literals of format 'yyyy-mm-dd', it works as expected, so it seems like my only problem is that I can't figure out how to set the date range parameters in the query (which I assume I'm supposed to do to allow date range control in data studio reports). How do I configure this daterange object so that people can control daterange tags in data studio reports?
Edit: For clarification, I know how to add the date range control on a report. The problem is that the query does not return any data even when the date range query parameters are passed in.
I ended up fixing my SQL query. I made my WHERE condition as
WHERE DATE(requestTimestamp) BETWEEN #startDate AND #endDate
and it actually returned data correctly. I didn't mention another parameter I was using in my query because I thought it was irrelevant, but I had quotes around another conditioned parameter, which may have screwed up the query. The condition before was more like:
WHERE id = '#id' AND DATE(requestTimestamp) BETWEEN #startDate AND #endDate
I think putting quotes around #id was the problem, because changing the query to:
WHERE id = #id AND DATE(requestTimestamp) BETWEEN #startDate AND #endDate
worked perfectly
You can use a Date range control and configured the timestamp field to it. It should automatically pick the timestamp type field.
Go to Insert and select Date range control to add it to your report.
You can select the date range in view mode.
Like this,
In BigQuery, I can compose a query and then set a destination table under More > Query Settings. This works as expected for queries without variables, for example:
SELECT * FROM foo.bar WHERE PARSE_TIMESTAMP("%a, %d %b %Y %X %z", date_created) > '2020-01-01 00:00:00';
However, when I try to replace that formatting string with a variable, suddenly the options to set a destination table do not exist under More > Query Settings. For example:
DECLARE date_format STRING DEFAULT "%a, %d %b %Y %X %z";
SELECT * FROM foo.bar WHERE PARSE_TIMESTAMP(date_format, date_created) > '2020-01-01 00:00:00';
Additionally, even when I try to schedule the second query, I do not have an option to set a destination table.
Is this behavior expected? Is it documented anywhere? I have been unable to find an explanation.
It is not because of use of parameters per se!
But rather limitation of scripting.
So, YES, it is expected - when you use scripting you cannot use destination otherwise you will be getting error
If you need to get result into some table - just use INSERT INTO or any other relevant DML/DDL within your script