DBT set variable using macros - google-bigquery

my goal is to get the last 2 dates from the tables and run insert_overwrite to load incremental on a large table. I am trying to set a variable inside the model by calling on the macros I wrote. The SQL query is in BigQuery.
I get an error message.
'None' has no attribute 'table'
inside model
{% set dates = get_last_two_dates('window_start',source('raw.event','tmp')) %}
macros
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% set max_value = run_query(query).columns[0][0] %}
{% do return(max_value) %}
{% endmacro %}
Thanks in advance. let me know if you have any other questions.

You probably need to wrap {% set max_value ... %} with an {% if execute %} block:
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% if execute %}
{% set max_value = run_query(query).columns[0][0] %}
{% else %}
{% set max_value = "" %}
{% endif %}
{% do return(max_value) %}
{% endmacro %}
The reason for this is that your macro actually gets run twice -- once when dbt is scanning all of the models to build the DAG, and a second time when the model is actually run. execute is only true for this second pass.

Related

Execute dbt model only if var is not empty list

I have a dbt incremental model that looks pretty like this:
-- depends_on: {{ref('stg_table')}}
{% set dates_query %}
SELECT DISTINCT date FROM dates_table
{% if is_incremental() %}
WHERE date NOT IN (SELECT DISTINCT date FROM {{this}})
{% endif %}
{% endset %}
{% set dates_res = run_query(dates_query) %}
{% if execute %}
{# Return the first column #}
{% set dates_list = dates_res.columns[0].values() %}
{% else %}
{% set dates_list = [] %}
{% endif %}
{% if dates_list %}
with
{% for date in dates_list %}
prel_{{date | replace('-', '_')}} as (
SELECT smth FROM {{ref('stg_table')}}
WHERE some_date = cast('{{date}}' as date)
),
{% endfor %}
prel AS (
select * from prel_{{dates_list[0] | replace('-', '_')}}
{% for date in dates_list[1:] %}
union all
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
SELECT some_transformations FROM prel
{% endif %}
But it fails with error, because it runs following statement in database:
create or replace view model__dbt_tmp
as (
-- depends_on: stg_table
);
So the question is how can I skip the model creation if dates list is empty?
Thanks :)
You need a valid query that has the right columns but returns zero rows. This should work:
{% if dates_list %}
with
{% for date in dates_list %}
prel_{{date | replace('-', '_')}} as (
SELECT smth FROM {{ref('stg_table')}}
WHERE some_date = cast('{{date}}' as date)
),
{% endfor %}
prel AS (
select * from prel_{{dates_list[0] | replace('-', '_')}}
{% for date in dates_list[1:] %}
union all
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
{% else %}
prel AS (
SELECT smth FROM {{ref('stg_table')}}
WHERE 1=0
)
{% endif %}
SELECT some_transformations FROM prel
Separately, I would make other simplifications to your code. Jinja has a loop variable inside for loops, and flags called loop.first and loop.last that are only true on the first and last elements of an iterable. So your for loop can become:
prel AS (
{% for date in dates_list %}
{% if not loop.first %}union all{% endif %}
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
But really I don't think you need to do all of this work with ctes and unioning. Your RDBMS probably supports the in operator with dates, and/or this could just be a join.

DBT run model only once

I've created a model to generate a calendar dimension which I only want to run when I explicitly specify to run it.
I tried to use incremental materialisation with nothing in is_incremental() block hoping dbt would do nothing if there was no query to satisfy the temporary view. Unfortunately this didn't work.
Any suggestion or thoughts for how I might achieve this greatly appreciated.
Regards,
Ashley
I've used a tag for this. Let's call this kind of thing a "static" model. In your model:
{{ config(tags=['static']) }}
and then in your production job:
dbt run --exclude tag:static
This doesn't quite achieve what you want, since you have to add the selector at the command line. But it's simple and self-documenting, which is nice.
I think you should be able to hack the incremental materialization to do this. dbt will complain about empty models, but you should be able to return a query with zero records. It'll depend on your RDBMS if this is really much better/faster/cheaper than just running the model, since dbt will still execute a query with the complex merge logic.
{{ config(materialized='incremental') }}
{% if is_incremental() %}
select * from {{ this }} limit 0
{% else %}
-- your model here, e.g.
{{ dbt_utils.date_spine( ... ) }}
{% endif %}
Your last/best option is probably to create a custom materialization that checks for an existing relation and no-ops if it finds one. You could borrow most of the code from the incremental materialization to do this. (You would add this as a macro in your project). Haven't tested this, but to give you an idea:
-- macros/static_materialization.sql
{% materialization static, default -%}
-- relations
{%- set existing_relation = load_cached_relation(this) -%}
{%- set target_relation = this.incorporate(type='table') -%}
{%- set temp_relation = make_temp_relation(target_relation)-%}
{%- set intermediate_relation = make_intermediate_relation(target_relation)-%}
{%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}
{%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}
-- configs
{%- set unique_key = config.get('unique_key') -%}
{%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}
{%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}
-- the temp_ and backup_ relations should not already exist in the database; get_relation
-- will return None in that case. Otherwise, we get a relation that we can drop
-- later, before we try to use this name for the current operation. This has to happen before
-- BEGIN, in a separate transaction
{%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}
{%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}
-- grab current tables grants config for comparision later on
{% set grant_config = config.get('grants') %}
{{ drop_relation_if_exists(preexisting_intermediate_relation) }}
{{ drop_relation_if_exists(preexisting_backup_relation) }}
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{% set to_drop = [] %}
{% if existing_relation is none %}
{% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
{% set need_swap = true %}
{% else %}
{# ----- only changed the code between these comments ----- #}
{# NO-OP. An incremental materialization would do a merge here #}
{% set build_sql = "select 1" %}
{# ----- only changed the code between these comments ----- #}
{% endif %}
{% call statement("main") %}
{{ build_sql }}
{% endcall %}
{% if need_swap %}
{% do adapter.rename_relation(target_relation, backup_relation) %}
{% do adapter.rename_relation(intermediate_relation, target_relation) %}
{% do to_drop.append(backup_relation) %}
{% endif %}
{% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
{% do persist_docs(target_relation, model) %}
{% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}
{% do create_indexes(target_relation) %}
{% endif %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{% do adapter.commit() %}
{% for rel in to_drop %}
{% do adapter.drop_relation(rel) %}
{% endfor %}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ return({'relations': [target_relation]}) }}
{%- endmaterialization %}
We are working with dbt run --select MODEL_NAME for each model we want to run. So a dbt run in our environment never executes more then one model. By doing so you never run in a situation where you execute a model by accident.

DBT Macro With Parameter IF statement eval

I am trying to create a sql template using a macro with one parameter, the if condition does not evaluate to true when passed TABLE1 or TABLE2
{% macro cloud_test_results_get_standard_columns(modelName) %}
result,
Length,
estimatedLength as estimatedLength,
{% if ‘{{modelName}}’ == ‘TABLE1’ %}
TABL1_COL1,
TABL1_COL1,
TABL1_COL1,
{% elif ‘{{modelName}}’ == ‘TABLE2’ %}
TABL1_COL1,
TABL1_COL1,
TABL1_COL1,
{% else %}
TABL_DEFAULT1,
TABL_DEFAULT2,
TABL_DEFAULT3,
{% endif %}
{% endmacro %}
please disregard, had to use modelName instead of ‘{{modelName}}’ inside if block

How to create histogram bins for use in dbt using Jinja template?

I am trying to create histogram bins in dbt using jinja. This is the code I am using.
{% set sql_statement %}
select min(eir) as min_eir, floor((max(eir) - min(eir))/10) + 1 as bin_size from {{ ref('interest_rate_table') }}
{% endset %}
{% set query_result = dbt_utils.get_query_results_as_dict(sql_statement) %}
{% set min_eir = query_result['min_eir'][0] %}
{% set bin_size = query_result['bin_size'][0] %}
{% set eir_bucket = [] %}
{% for i in range(10) %}
{% set eir_bucket = eir_bucket.append(min_eir + i*bin_size) %}
{% endfor %}
{{ log(eir_bucket, info=True) }}
select 1 as num
The above code returns dbt.exceptions.UndefinedMacroException.
Below is the error log.
dbt.exceptions.UndefinedMacroException: Compilation Error in model terms_dist (/my/file/dir)
'bin_size' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".
Now, I haven't written the SQL yet. I want to build an array containing the historical bins, that I can use in my code.

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?