DBT run model only once - dbt

I've created a model to generate a calendar dimension which I only want to run when I explicitly specify to run it.
I tried to use incremental materialisation with nothing in is_incremental() block hoping dbt would do nothing if there was no query to satisfy the temporary view. Unfortunately this didn't work.
Any suggestion or thoughts for how I might achieve this greatly appreciated.
Regards,
Ashley

I've used a tag for this. Let's call this kind of thing a "static" model. In your model:
{{ config(tags=['static']) }}
and then in your production job:
dbt run --exclude tag:static
This doesn't quite achieve what you want, since you have to add the selector at the command line. But it's simple and self-documenting, which is nice.
I think you should be able to hack the incremental materialization to do this. dbt will complain about empty models, but you should be able to return a query with zero records. It'll depend on your RDBMS if this is really much better/faster/cheaper than just running the model, since dbt will still execute a query with the complex merge logic.
{{ config(materialized='incremental') }}
{% if is_incremental() %}
select * from {{ this }} limit 0
{% else %}
-- your model here, e.g.
{{ dbt_utils.date_spine( ... ) }}
{% endif %}
Your last/best option is probably to create a custom materialization that checks for an existing relation and no-ops if it finds one. You could borrow most of the code from the incremental materialization to do this. (You would add this as a macro in your project). Haven't tested this, but to give you an idea:
-- macros/static_materialization.sql
{% materialization static, default -%}
-- relations
{%- set existing_relation = load_cached_relation(this) -%}
{%- set target_relation = this.incorporate(type='table') -%}
{%- set temp_relation = make_temp_relation(target_relation)-%}
{%- set intermediate_relation = make_intermediate_relation(target_relation)-%}
{%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}
{%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}
-- configs
{%- set unique_key = config.get('unique_key') -%}
{%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}
{%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}
-- the temp_ and backup_ relations should not already exist in the database; get_relation
-- will return None in that case. Otherwise, we get a relation that we can drop
-- later, before we try to use this name for the current operation. This has to happen before
-- BEGIN, in a separate transaction
{%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}
{%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}
-- grab current tables grants config for comparision later on
{% set grant_config = config.get('grants') %}
{{ drop_relation_if_exists(preexisting_intermediate_relation) }}
{{ drop_relation_if_exists(preexisting_backup_relation) }}
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{% set to_drop = [] %}
{% if existing_relation is none %}
{% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
{% set need_swap = true %}
{% else %}
{# ----- only changed the code between these comments ----- #}
{# NO-OP. An incremental materialization would do a merge here #}
{% set build_sql = "select 1" %}
{# ----- only changed the code between these comments ----- #}
{% endif %}
{% call statement("main") %}
{{ build_sql }}
{% endcall %}
{% if need_swap %}
{% do adapter.rename_relation(target_relation, backup_relation) %}
{% do adapter.rename_relation(intermediate_relation, target_relation) %}
{% do to_drop.append(backup_relation) %}
{% endif %}
{% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
{% do persist_docs(target_relation, model) %}
{% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}
{% do create_indexes(target_relation) %}
{% endif %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{% do adapter.commit() %}
{% for rel in to_drop %}
{% do adapter.drop_relation(rel) %}
{% endfor %}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ return({'relations': [target_relation]}) }}
{%- endmaterialization %}

We are working with dbt run --select MODEL_NAME for each model we want to run. So a dbt run in our environment never executes more then one model. By doing so you never run in a situation where you execute a model by accident.

Related

DBT - how can i add model configuration (using a macro on {{this}}) in dbt_project.yml

I want to add node_color to all of my dbt models based on my filename prefix to make it easier to navigate through my dbt documentation :
fact_ => red
base__ => black.
To do so i have a macro that works well :
{% macro get_model_color(model) %}
{% set default_color = 'blue' %}
{% set ns = namespace(model_color=default_color) %}
{% set dict_patterns = {"base__[a-z0-9_]+" : "black", "ref_[a-z0-9_]+" : "yellow", "fact_[a-z0-9_]+" : "red"} %}
{% set re = modules.re %}
{% for pattern, color in dict_patterns.items() %}
{% set is_match = re.match(pattern, model.identifier, re.IGNORECASE) %}
{% if is_match %}
{% set ns.model_color = color %}
{% endif %}
{% endfor %}
{{ return({'node_color': ns.model_color}) }}
{% endmacro %}
And i call it in my model .sql :
{{config(
materialized = 'table',
tags=['daily'],
docs=get_model_color(this),
)}}
This works well but force me to add this line of code in all my models (and in all the new ones).
Is there a way i can define it in my dbt_project.yml to make it available to all my models automatically?
I have tried many things like the config jinja function or this kind of code in dbt_project.yml
+docs:
node_color: "{{ get_model_color(this) }}"
returning Could not render {{ get_model_color(this) }}: 'get_model_color' is undefined
But nothing seems to work
Any idea? Thanks

DBT set variable using macros

my goal is to get the last 2 dates from the tables and run insert_overwrite to load incremental on a large table. I am trying to set a variable inside the model by calling on the macros I wrote. The SQL query is in BigQuery.
I get an error message.
'None' has no attribute 'table'
inside model
{% set dates = get_last_two_dates('window_start',source('raw.event','tmp')) %}
macros
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% set max_value = run_query(query).columns[0][0] %}
{% do return(max_value) %}
{% endmacro %}
Thanks in advance. let me know if you have any other questions.
You probably need to wrap {% set max_value ... %} with an {% if execute %} block:
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% if execute %}
{% set max_value = run_query(query).columns[0][0] %}
{% else %}
{% set max_value = "" %}
{% endif %}
{% do return(max_value) %}
{% endmacro %}
The reason for this is that your macro actually gets run twice -- once when dbt is scanning all of the models to build the DAG, and a second time when the model is actually run. execute is only true for this second pass.

How to create histogram bins for use in dbt using Jinja template?

I am trying to create histogram bins in dbt using jinja. This is the code I am using.
{% set sql_statement %}
select min(eir) as min_eir, floor((max(eir) - min(eir))/10) + 1 as bin_size from {{ ref('interest_rate_table') }}
{% endset %}
{% set query_result = dbt_utils.get_query_results_as_dict(sql_statement) %}
{% set min_eir = query_result['min_eir'][0] %}
{% set bin_size = query_result['bin_size'][0] %}
{% set eir_bucket = [] %}
{% for i in range(10) %}
{% set eir_bucket = eir_bucket.append(min_eir + i*bin_size) %}
{% endfor %}
{{ log(eir_bucket, info=True) }}
select 1 as num
The above code returns dbt.exceptions.UndefinedMacroException.
Below is the error log.
dbt.exceptions.UndefinedMacroException: Compilation Error in model terms_dist (/my/file/dir)
'bin_size' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".
Now, I haven't written the SQL yet. I want to build an array containing the historical bins, that I can use in my code.

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?

dbt macro to iterate over item in list within a sql call?

First off, I am a dbt backer! I love this tool and the versatility of it.
When reading some of the docs I noticed that I might be able to do some meta work on my schemas every time I call a macro.
One of those would be to clean up schemas.
(This has been edited as per discussion within the dbt slack)
dbt run-operation freeze that would introspect all of the tables that would be written with dbt run but with an autogenerated hash (might just be timestamp). It would output those tables in the schema of my choice and would log the “hash” to console.
dbt run-operation unfreeze --args '{hash: my_hash}' that would then proceed to find the tables written with that hash prefix and clean them out of the schema.
I have created such a macro in an older version of dbt and it still works on 0.17.1.
The macro below item_in_list_query is getting a list of tables from a separate macro get_tables (also below). That list of tables is then concatenated inside item_in_list_query to compose a desired SQL query and execute it. For demonstration there is also a model in which item_in_list_query is used.
item_in_list_query
{% macro item_in_list_query() %}
{% set tables = get_tables() %}
{{ log("Tables: " ~ tables, True) }}
{% set query %}
select id
from my_tables
{% if tables -%}
where lower(table_name) in {% for t in tables -%} {{ t }} {%- endfor -%}
{%- endif -%}
{% endset %}
{{ log("query: " ~ query, True) }}
{# run_query returns agate.Table (https://agate.readthedocs.io/en/1.6.1/api/table.html). #}
{% set results = run_query(query) %}
{{ log("results: " ~ results, True) }}
{# execute is a Jinja variable that returns True when dbt is in "execute" mode i.e. True when running dbt run but False during dbt compile. #}
{% if execute %}
{# agate.table.rows is agate.MappedSequence in which data that can be accessed either by numeric index or by key. #}
{% set results_list = results.rows %}
{% else %}
{% set results_list = [] %}
{% endif %}
{{ log("results_list: " ~ results_list, True) }}
{{ return(results_list) }}
{% endmacro %}
get_tables
{% macro get_tables() %}
{%- set tables = [
('table1', 'table2')
] -%}
{{return(tables )}}
{% endmacro %}
model
{%- for item in item_in_list_query() -%}
{%- if not loop.first %} UNION ALL {% endif %}
select {{ item.id }}
{%- endfor -%}