DBT models: how to create variable from query and use it in the If statement - dbt

I have been trying to create variables in my SQL models (defined by Select statement, not static variables) with no success (trying to mimick Declare/Set statements from SQL stored procedures). I have been using call statement function to run my statements and then using Set to assign the result of my statements to a variable but whatever I do, I get errors that either variable is missing from config or some compilation errors.
Trying to run the following IF statement:
{%- call statement(name='get_last_snapshot_date', fetch_result=True) -%}
Select ifnull(max(snapshot_date),'9999-09-09') from my_data_source
{%- endcall -%}
{%- set data_last_snapshot_date = load_result('get_last_snapshot_date') -%}
{%- set last_snapshot_date = data_last_snapshot_date['data'][0][0] -%}
{%- call statement(name='get_current_date', fetch_result=True) -%}
Select current_date('GB')
{%- endcall -%}
{%- set data_get_current_date = load_result('get_current_date') -%}
{%- set current_snapshot_date = data_get_current_date['data'][0][0] -%}
{% if current_snapshot_date == last_snapshot_date: %}
Delete From my_data_source
Where snapshot_date = current_snapshot_date
{% endif %}
Gives me the following error:
[2021-11-08 16:48:53,433] {pod_launcher.py:149} INFO - Compilation Error in model inventory_hist_test (models/inventory_hist_test.sql)
[2021-11-08 16:48:53,433] {pod_launcher.py:149} INFO - expected token ':', got '}'
[2021-11-08 16:48:53,433] {pod_launcher.py:149} INFO - line 29
[2021-11-08 16:48:53,434] {pod_launcher.py:149} INFO - {% if {{ current_snapshot_date }} == {{ last_snapshot_date }} %} (edited)

Related

Invalid type for parameter 'TO_GEOGRAPHY'

Why does casting
select cast(st_makepoint(-90.345929, 37.278424) as geography)
raise the following error:
SQL compilation error: invalid type [CAST(ST_MAKEPOINT(TO_DOUBLE(-90.345929), TO_DOUBLE(37.278424)) AS GEOGRAPHY)] for parameter 'TO_GEOGRAPHY'
While a seemingly more direct pass of the st_makepoint result to to_geography does not?
select to_geography(st_makepoint(-90.345929, 37.278424))
I'm fairly sure I'm stuck with the casting behavior in the dbt tool I'm using. Basically I'm trying to union a bunch of tables with this geography field, and in the compiled SQL this casting logic appears as a function of dbt's union_relations macro, and I don't seem to be able to control whether the casting occurs.
The source for union_relations is here.
You can copy this macro into your own project (under the macros directory) and patch the source, and then call it with union_relations instead of dbt_utils.union_relations.
The offending lines are 106-113. Something like this should work fine:
{% for col_name in ordered_column_names -%}
{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
{% if col_type == 'geography' %}
to_geography({{ col_name }}) as {{ col.quoted }}
{% else %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }}
{% endif %}
{%- if not loop.last %},{% endif -%}
{%- endfor %}
Because CAST doesn't support that particular combination of source and target datatypes

DBT run model only once

I've created a model to generate a calendar dimension which I only want to run when I explicitly specify to run it.
I tried to use incremental materialisation with nothing in is_incremental() block hoping dbt would do nothing if there was no query to satisfy the temporary view. Unfortunately this didn't work.
Any suggestion or thoughts for how I might achieve this greatly appreciated.
Regards,
Ashley
I've used a tag for this. Let's call this kind of thing a "static" model. In your model:
{{ config(tags=['static']) }}
and then in your production job:
dbt run --exclude tag:static
This doesn't quite achieve what you want, since you have to add the selector at the command line. But it's simple and self-documenting, which is nice.
I think you should be able to hack the incremental materialization to do this. dbt will complain about empty models, but you should be able to return a query with zero records. It'll depend on your RDBMS if this is really much better/faster/cheaper than just running the model, since dbt will still execute a query with the complex merge logic.
{{ config(materialized='incremental') }}
{% if is_incremental() %}
select * from {{ this }} limit 0
{% else %}
-- your model here, e.g.
{{ dbt_utils.date_spine( ... ) }}
{% endif %}
Your last/best option is probably to create a custom materialization that checks for an existing relation and no-ops if it finds one. You could borrow most of the code from the incremental materialization to do this. (You would add this as a macro in your project). Haven't tested this, but to give you an idea:
-- macros/static_materialization.sql
{% materialization static, default -%}
-- relations
{%- set existing_relation = load_cached_relation(this) -%}
{%- set target_relation = this.incorporate(type='table') -%}
{%- set temp_relation = make_temp_relation(target_relation)-%}
{%- set intermediate_relation = make_intermediate_relation(target_relation)-%}
{%- set backup_relation_type = 'table' if existing_relation is none else existing_relation.type -%}
{%- set backup_relation = make_backup_relation(target_relation, backup_relation_type) -%}
-- configs
{%- set unique_key = config.get('unique_key') -%}
{%- set full_refresh_mode = (should_full_refresh() or existing_relation.is_view) -%}
{%- set on_schema_change = incremental_validate_on_schema_change(config.get('on_schema_change'), default='ignore') -%}
-- the temp_ and backup_ relations should not already exist in the database; get_relation
-- will return None in that case. Otherwise, we get a relation that we can drop
-- later, before we try to use this name for the current operation. This has to happen before
-- BEGIN, in a separate transaction
{%- set preexisting_intermediate_relation = load_cached_relation(intermediate_relation)-%}
{%- set preexisting_backup_relation = load_cached_relation(backup_relation) -%}
-- grab current tables grants config for comparision later on
{% set grant_config = config.get('grants') %}
{{ drop_relation_if_exists(preexisting_intermediate_relation) }}
{{ drop_relation_if_exists(preexisting_backup_relation) }}
{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}
{% set to_drop = [] %}
{% if existing_relation is none %}
{% set build_sql = get_create_table_as_sql(False, target_relation, sql) %}
{% elif full_refresh_mode %}
{% set build_sql = get_create_table_as_sql(False, intermediate_relation, sql) %}
{% set need_swap = true %}
{% else %}
{# ----- only changed the code between these comments ----- #}
{# NO-OP. An incremental materialization would do a merge here #}
{% set build_sql = "select 1" %}
{# ----- only changed the code between these comments ----- #}
{% endif %}
{% call statement("main") %}
{{ build_sql }}
{% endcall %}
{% if need_swap %}
{% do adapter.rename_relation(target_relation, backup_relation) %}
{% do adapter.rename_relation(intermediate_relation, target_relation) %}
{% do to_drop.append(backup_relation) %}
{% endif %}
{% set should_revoke = should_revoke(existing_relation, full_refresh_mode) %}
{% do apply_grants(target_relation, grant_config, should_revoke=should_revoke) %}
{% do persist_docs(target_relation, model) %}
{% if existing_relation is none or existing_relation.is_view or should_full_refresh() %}
{% do create_indexes(target_relation) %}
{% endif %}
{{ run_hooks(post_hooks, inside_transaction=True) }}
-- `COMMIT` happens here
{% do adapter.commit() %}
{% for rel in to_drop %}
{% do adapter.drop_relation(rel) %}
{% endfor %}
{{ run_hooks(post_hooks, inside_transaction=False) }}
{{ return({'relations': [target_relation]}) }}
{%- endmaterialization %}
We are working with dbt run --select MODEL_NAME for each model we want to run. So a dbt run in our environment never executes more then one model. By doing so you never run in a situation where you execute a model by accident.

Fixing the error - cannot unpack non-iterable NoneType object in a DBT macro using Jinja

I am writing a macro that uses Snowflake to offload a prebuilt table to an s3 bucket. Normally, we just have one table but in this instance I have 5 or 6 tables to unload into the same s3 bucket. I was writing a for loop to iterate through a list of a dictionary with the file being the name of the file being written to s3 and the table being the table to unload in Snowflake. The code works, but I keep getting an error after the unloading of cannot unpack non-iterable NoneType object which makes me think that the loop is trying to run one last time.
The code I have is as follows:
{% macro send_anaplan_data_to_s3() %}
{{ log('Running Export to S3 Macro ...', info = true) }}
{% set table_names=[{"file":" '/file_name.csv' ", "table":"DATABSE.SCHEMA.TABLE"}] %}
{% for name in table_names %}
{{ log(name.file, info = true) }}
{{ log(name.table, info = true) }}
-----first table-----
-- this block makes the s3 path with dated filename
{%- call statement('send_s3_statement', fetch_result=True) -%}
select concat('s3://data-anaplan',{{ name.file }})
{%- endcall -%}
-- first compiles then executes against db
-- so we need if/else otherwise it will fail on
-- compile when accessing .data[0]
{%- if execute -%}
{%- set result = load_result('send_s3_statement').data[0] -%}
{%- else -%}
{%- set result = [] -%}
{%- endif -%}
-- spot check the resulting filename.
{{ log('resulting filename:', info=True )}}
{{ log(result, info=True )}}
-- send the data to the correct location in S3
{% for r in result -%}
{{ log(r, info=true) }}
{{ log('Unloading to s3', info = true) }}
{%- call statement(auto_begin=true) -%}
COPY INTO '{{r}}' from {{ name.table }}
STORAGE_INTEGRATION = S3_SNOWFLAKE_ANAPLAN_INTEGRATION
SINGLE = TRUE
MAX_FILE_SIZE = 4900000000
OVERWRITE = TRUE
FILE_FORMAT = (TYPE = CSV, FIELD_DELIMITER = ',', FIELD_OPTIONALLY_ENCLOSED_BY = '"', COMPRESSION = NONE, NULL_IF=())
HEADER = TRUE
{%- endcall -%}
{%- if not loop.last -%} , {%- endif %}
{% endfor -%}
-- confirm when done
{{log('Finished.', info=True)}}
{% endfor %}
{% endmacro %}
Any ideas here? Thank you!
This error implies that one of the queries you pass to the db is empty/null. Check the db logs to see where that's happening.

dbt macro to iterate over item in list within a sql call?

First off, I am a dbt backer! I love this tool and the versatility of it.
When reading some of the docs I noticed that I might be able to do some meta work on my schemas every time I call a macro.
One of those would be to clean up schemas.
(This has been edited as per discussion within the dbt slack)
dbt run-operation freeze that would introspect all of the tables that would be written with dbt run but with an autogenerated hash (might just be timestamp). It would output those tables in the schema of my choice and would log the “hash” to console.
dbt run-operation unfreeze --args '{hash: my_hash}' that would then proceed to find the tables written with that hash prefix and clean them out of the schema.
I have created such a macro in an older version of dbt and it still works on 0.17.1.
The macro below item_in_list_query is getting a list of tables from a separate macro get_tables (also below). That list of tables is then concatenated inside item_in_list_query to compose a desired SQL query and execute it. For demonstration there is also a model in which item_in_list_query is used.
item_in_list_query
{% macro item_in_list_query() %}
{% set tables = get_tables() %}
{{ log("Tables: " ~ tables, True) }}
{% set query %}
select id
from my_tables
{% if tables -%}
where lower(table_name) in {% for t in tables -%} {{ t }} {%- endfor -%}
{%- endif -%}
{% endset %}
{{ log("query: " ~ query, True) }}
{# run_query returns agate.Table (https://agate.readthedocs.io/en/1.6.1/api/table.html). #}
{% set results = run_query(query) %}
{{ log("results: " ~ results, True) }}
{# execute is a Jinja variable that returns True when dbt is in "execute" mode i.e. True when running dbt run but False during dbt compile. #}
{% if execute %}
{# agate.table.rows is agate.MappedSequence in which data that can be accessed either by numeric index or by key. #}
{% set results_list = results.rows %}
{% else %}
{% set results_list = [] %}
{% endif %}
{{ log("results_list: " ~ results_list, True) }}
{{ return(results_list) }}
{% endmacro %}
get_tables
{% macro get_tables() %}
{%- set tables = [
('table1', 'table2')
] -%}
{{return(tables )}}
{% endmacro %}
model
{%- for item in item_in_list_query() -%}
{%- if not loop.first %} UNION ALL {% endif %}
select {{ item.id }}
{%- endfor -%}

How to check for the existence of an element in a Liquid array (Shopify) without using join

I want to check for array values in an array created from a "split". Is there a way to do it without doing the following:
{%- assign blog_tags_string = blogs.news.all_tags | join ' ' -%}
{%- if blog_tags_string contains blog_title -%}
{%- assign is_tag_page = true -%}
{%- else -%}
{%- assign is_tag_page = false -%}
{%- endif -%}
Reading the documentation we can see that :
contains can also check for the presence of a string in an array of strings.
So, no join is necessary, and this will do the job.
{%- if blogs.news.all_tags contains blog_title -%}
...