DBT set variable using macros

DBT set variable using macros - google-bigquery

my goal is to get the last 2 dates from the tables and run insert_overwrite to load incremental on a large table. I am trying to set a variable inside the model by calling on the macros I wrote. The SQL query is in BigQuery.
I get an error message.
'None' has no attribute 'table'
inside model
{% set dates = get_last_two_dates('window_start',source('raw.event','tmp')) %}
macros
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% set max_value = run_query(query).columns[0][0] %}
{% do return(max_value) %}
{% endmacro %}
Thanks in advance. let me know if you have any other questions.

You probably need to wrap {% set max_value ... %} with an {% if execute %} block:
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% if execute %}
{% set max_value = run_query(query).columns[0][0] %}
{% else %}
{% set max_value = "" %}
{% endif %}
{% do return(max_value) %}
{% endmacro %}
The reason for this is that your macro actually gets run twice -- once when dbt is scanning all of the models to build the DAG, and a second time when the model is actually run. execute is only true for this second pass.

Related

Execute dbt model only if var is not empty list

I have a dbt incremental model that looks pretty like this:
-- depends_on: {{ref('stg_table')}}
{% set dates_query %}
SELECT DISTINCT date FROM dates_table
{% if is_incremental() %}
WHERE date NOT IN (SELECT DISTINCT date FROM {{this}})
{% endif %}
{% endset %}
{% set dates_res = run_query(dates_query) %}
{% if execute %}
{# Return the first column #}
{% set dates_list = dates_res.columns[0].values() %}
{% else %}
{% set dates_list = [] %}
{% endif %}
{% if dates_list %}
with
{% for date in dates_list %}
prel_{{date | replace('-', '_')}} as (
SELECT smth FROM {{ref('stg_table')}}
WHERE some_date = cast('{{date}}' as date)
),
{% endfor %}
prel AS (
select * from prel_{{dates_list[0] | replace('-', '_')}}
{% for date in dates_list[1:] %}
union all
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
SELECT some_transformations FROM prel
{% endif %}
But it fails with error, because it runs following statement in database:
create or replace view model__dbt_tmp
as (
-- depends_on: stg_table
);
So the question is how can I skip the model creation if dates list is empty?
Thanks :)

You need a valid query that has the right columns but returns zero rows. This should work:
{% if dates_list %}
with
{% for date in dates_list %}
prel_{{date | replace('-', '_')}} as (
SELECT smth FROM {{ref('stg_table')}}
WHERE some_date = cast('{{date}}' as date)
),
{% endfor %}
prel AS (
select * from prel_{{dates_list[0] | replace('-', '_')}}
{% for date in dates_list[1:] %}
union all
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
{% else %}
prel AS (
SELECT smth FROM {{ref('stg_table')}}
WHERE 1=0
)
{% endif %}
SELECT some_transformations FROM prel
Separately, I would make other simplifications to your code. Jinja has a loop variable inside for loops, and flags called loop.first and loop.last that are only true on the first and last elements of an iterable. So your for loop can become:
prel AS (
{% for date in dates_list %}
{% if not loop.first %}union all{% endif %}
select * from prel_{{date | replace('-', '_')}}
{% endfor %}
)
But really I don't think you need to do all of this work with ctes and unioning. Your RDBMS probably supports the in operator with dates, and/or this could just be a join.

DBT run model only once

I've created a model to generate a calendar dimension which I only want to run when I explicitly specify to run it.
I tried to use incremental materialisation with nothing in is_incremental() block hoping dbt would do nothing if there was no query to satisfy the temporary view. Unfortunately this didn't work.
Any suggestion or thoughts for how I might achieve this greatly appreciated.
Regards,
Ashley

I've used a tag for this. Let's call this kind of thing a "static" model. In your model:
{{ config(tags=['static']) }}
and then in your production job:
dbt run --exclude tag:static
This doesn't quite achieve what you want, since you have to add the selector at the command line. But it's simple and self-documenting, which is nice.
I think you should be able to hack the incremental materialization to do this. dbt will complain about empty models, but you should be able to return a query with zero records. It'll depend on your RDBMS if this is really much better/faster/cheaper than just running the model, since dbt will still execute a query with the complex merge logic.
{{ config(materialized='incremental') }}
{% if is_incremental() %}
select * from {{ this }} limit 0
{% else %}
-- your model here, e.g.
{{ dbt_utils.date_spine( ... ) }}
{% endif %}
Your last/best option is probably to create a custom materialization that checks for an existing relation and no-ops if it finds one. You could borrow most of the code from the incremental materialization to do this. (You would add this as a macro in your project). Haven't tested this, but to give you an idea:
-- macros/static_materialization.sql
{% materialization static, default -%}
-- relations
{%- set existing_relation = load_cached_relation(this) -%}
{%- set target_relation = this.incorporate(type='table') -%}
{%- set temp_relation = make_temp_relation(target_relation)-%}
{%- set intermediate_relation = make_intermediate_relation(target_relation)-%}
{%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}
{%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}
-- configs
{%- set unique_key = config.get('unique_key') -%}
{%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}
{%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}
-- the temp_ and backup_ relations should not already exist in the database; get_relation
-- will return None in that case. Otherwise, we get a relation that we can drop
-- later, before we try to use this name for the current operation. This has to happen before
-- BEGIN, in a separate transaction
{%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}
{%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}
-- grab current tables grants config for comparision later on
{% set grant_config = config.get('grants') %}
{{ drop_relation_if_exists(preexisting_intermediate_relation) }}
{{ drop_relation_if_exists(preexisting_backup_relation) }}
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{% set to_drop = [] %}
{% if existing_relation is none %}
{% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
{% set need_swap = true %}
{% else %}
{# ----- only changed the code between these comments ----- #}
{# NO-OP. An incremental materialization would do a merge here #}
{% set build_sql = "select 1" %}
{# ----- only changed the code between these comments ----- #}
{% endif %}
{% call statement("main") %}
{{ build_sql }}
{% endcall %}
{% if need_swap %}
{% do adapter.rename_relation(target_relation, backup_relation) %}
{% do adapter.rename_relation(intermediate_relation, target_relation) %}
{% do to_drop.append(backup_relation) %}
{% endif %}
{% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
{% do persist_docs(target_relation, model) %}
{% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}
{% do create_indexes(target_relation) %}
{% endif %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{% do adapter.commit() %}
{% for rel in to_drop %}
{% do adapter.drop_relation(rel) %}
{% endfor %}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ return({'relations': [target_relation]}) }}
{%- endmaterialization %}

We are working with dbt run --select MODEL_NAME for each model we want to run. So a dbt run in our environment never executes more then one model. By doing so you never run in a situation where you execute a model by accident.

DBT Macro With Parameter IF statement eval

I am trying to create a sql template using a macro with one parameter, the if condition does not evaluate to true when passed TABLE1 or TABLE2
{% macro cloud_test_results_get_standard_columns(modelName) %}
result,
Length,
estimatedLength as estimatedLength,
{% if ‘{{modelName}}’ == ‘TABLE1’ %}
TABL1_COL1,
TABL1_COL1,
TABL1_COL1,
{% elif ‘{{modelName}}’ == ‘TABLE2’ %}
TABL1_COL1,
TABL1_COL1,
TABL1_COL1,
{% else %}
TABL_DEFAULT1,
TABL_DEFAULT2,
TABL_DEFAULT3,
{% endif %}
{% endmacro %}

please disregard, had to use modelName instead of ‘{{modelName}}’ inside if block

How to create histogram bins for use in dbt using Jinja template?

I am trying to create histogram bins in dbt using jinja. This is the code I am using.
{% set sql_statement %}
select min(eir) as min_eir, floor((max(eir) - min(eir))/10) + 1 as bin_size from {{ ref('interest_rate_table') }}
{% endset %}
{% set query_result = dbt_utils.get_query_results_as_dict(sql_statement) %}
{% set min_eir = query_result['min_eir'][0] %}
{% set bin_size = query_result['bin_size'][0] %}
{% set eir_bucket = [] %}
{% for i in range(10) %}
{% set eir_bucket = eir_bucket.append(min_eir + i*bin_size) %}
{% endfor %}
{{ log(eir_bucket, info=True) }}
select 1 as num
The above code returns dbt.exceptions.UndefinedMacroException.
Below is the error log.
dbt.exceptions.UndefinedMacroException: Compilation Error in model terms_dist (/my/file/dir)
'bin_size' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".
Now, I haven't written the SQL yet. I want to build an array containing the historical bins, that I can use in my code.

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DBT set variable using macros - google-bigquery

Related

Execute dbt model only if var is not empty list

DBT run model only once

DBT Macro With Parameter IF statement eval

How to create histogram bins for use in dbt using Jinja template?

Macro to surface models to other schemas - dbt_utils.star()

Categories

Resources