Need Help to resolve dbt audit helper test error - dbt

I created a macro that calls another macro in dbt audit helper package .
this is the code I used :
(
{% set old_etl_relation=adapter.get_relation(
database=target.database,
schema="LANDING",
identifier="STG_ORDERS"
) -%}
{% set dbt_relation=ref('STG_ORDERS') %}
{{ audit_helper.compare_relation_columns(
a_relation=old_etl_relation,
b_relation=dbt_relation
) }}
)
but I am getting an error saying jinja2.exceptions.UndefinedError: 'None' has no attribute 'information_schema'
Any help will be really appreciated.

Related

How to use expressions in if statements in DBT

I want to know how, if possible, I can use dbt expressions that are enclosed in two curly brackets ({{ }}), inside a statement that is enclosed in a curly bracket and a percent sign ({% %}).
For example, I want to execute a piece of code in DBT if the table exists. In my head, it would look something like:
{% if {{this}} is not none %}
do something
{% endif %}
But there's a syntax issue here and I can't seem to be able to use expressions inside statement blocks. I have seen the following implementation but I want to know how I can replace source with {{this}}.
{% set table_exists=source('db', 'table') is not none %}
{% if table_exists %}
do something
{% endif %}
These are the docs I have read:
'this' jinja function
jinja and macros
dbt if table exists example
using load_relation to check if model exists
Don't Nest Your Curlies
If you're inside either {{ ... }} or {% ... %}, your code will be executed by the jinja templating engine. this is a variable that is already set in the jinja context. You use {{ this }} in SQL, but if you're already in the jinja context provided by {% ... %}, you can just write this, without the curlies.
Your if block becomes:
{% if this is not none %}
do something
{% endif %}

How do I run SQL model in dbt multiple times by looping through variables?

I have a model in dbt (test_model) that accepts a geography variable (zip, state, region) in the configuration. I would like to run the model three times by looping through the variables, each time running it with a different variable.
Here's the catch: I have a macro shown below that appends the variable to the end of the output table name (i.e., running test_model with zip as the variable outputs a table called test_model_zip). This is accomplished by adding {{ config(alias=var('geo')) }} at the top of the model.
Whether I define the variable within dbt_project.yml, the model itself, or on the CLI, I've been unable to find a way to loop through these variables, each time passing the new variable to the configuration, and successfully create three tables. Do any of you have an idea how to accomplish this? FWIW, I'm using BigQuery SQL.
The macro:
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}
{%- if custom_alias_name is none -%}
{{ node.name }}
{%- else -%}
{% set node_name = node.name ~ '_' ~ custom_alias_name %}
{{ node_name | trim }}
{%- endif -%}
{%- endmacro %}
The model, run by entering dbt run --select test_model.sql --vars '{"geo": "zip"}' in the CLI:
{{ config(materialized='table', alias=var('geo')) }}
with query as (select 1 as id)
select * from query
The current output: a single table called test_model_zip.
The desired output: three tables called test_model_zip, test_model_state, and test_model_region.
I would flip this on its head.
dbt doesn't really have a concept for parameterized models, so if you materialize a single model in multiple places, you'll lose lineage (the DAG relationship) and docs/etc. will get all confused.
Much better to create multiple model files that simply call a macro with a different parameter, like this:
geo_model_macro.sql
{% macro geo_model_macro(grain) %}
select
{{ grain }},
count(*)
from {{ ref('my_upstream_table') }}
group by 1
{% endmacro %}
test_model_zip.sql
{{ geo_model_macro('zip') }}
test_model_state.sql
{{ geo_model_macro('state') }}
test_model_region.sql
{{ geo_model_macro('region') }}
If I needed to do this hundreds of times (instead of 3), I would either:
Create a script to generate all of these .sql files for me
Create a new materialization that accepted a list of parameters, but this would be a super-advanced, here-be-dragons approach that is probably only appropriate when you've maxed out your other options.

dbt Snapshot - if not execute

I have a dbt Snapshot that calls a Macro to get a list of column names back from a database.
It works fine when using
dbt run
when using the snapshot command it fails because it doesn't run in execute mode.
dbt snapshot
I am currently using if not execute in the Macro which helps for compiling the project.
{%- if not execute -%}
Is there anyway to get around this so I could use the Snapshot functionality without doing a run operation on all models etc?
Thanks
edit :
Macro works fine in models when running dbt run.
When placed in snapshots it runs not in execute mode so the "Test" values are returned instead of values from a query.
{% macro GetColumnNames(DatabaseName, SchemaName, TableName) %}
{%- if not execute -%}
{{ return(["Test1","Test2"]) }}
{% endif %}
{%- set QueryRetrieveColumnNames -%}
SELECT
...
, COLUMN_NAME ...
FROM ...
{%- endset -%}
{% set Results = run_query(QueryRetrieveColumnNames) %}}
{%- set ColumnNames = Results.columns[3].values() -%}}
{{ return(ColumnNames) }}
{% endmacro %}
In the snapshot I'm doing other things, but even just the columns on their own won't work
{% snapshot TestSnapshot %}
{% set Relation = source(...) -%}
{% set ColumnNames = GetColumnNames(Relation.database, Relation.schema, Relation.identifier) -%}
SELECT
'a' AS a
{%- for ColumnName in ColumnNames %}
, "{{ ColumnName.column }}"
{%- endfor %}
FROM {{ source(...) }}
{% endsnapshot %}
I've switched from the Macro to use get_columns_in_relation
{% set ColumnNames = adapter.get_columns_in_relation(Relation) -%}
This fails at parsing, yet runs fine in models.
Parsing Error in snapshot ...
at path ['check_cols']: Undefined is not valid under any of the given schemas
Not sure the context of this question (dbtCloud, CLI etc.) so this is a ballpark solution.
According to the docs on the snapshot command for the CLI, you should be able to use something like:
dbt snapshot --select column_snapshot
if that's the only thing you want to "snapshot"
Additionally, if this is in dbtCloud, you could create a model & job with something like the following (I use this for testing pre-hook & post-hook functionality)
one-model.sql
select 1
(any model with valid sql works)
Then for that cloud job:
dbt seed --full-refresh
dbt run --models one-model --full-refresh
dbt snapshot --select column_snapshot

DBT custom schema using folder structure

is there a way in DBT to create custom schemas for a model in a derived way by looking at the folder structure?
For example, say this is my structure:
models
└-- product1
└-- team1
| └-- model1.sql
└-- team2
└-- model2.sql
In this case, model1.sql would be created in the schema product1_team1 whereas model2.sql would be created in the schema product1_team2. I guess I can specify those "by hand" in the dbt_project.yml file, but I was wondering if there was a way to do this in an automated way - so that every new model or folder is automatically created in the right schema.
I was looking at custom schema macros (https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas) but it seems to be plain jinja or simple Python built-ins. Not sure how I would be able to access folder paths in those macros.
Also, is there a way to write a macro in Python? as it could be relatively straightforward knowing the file path and with the os module.
You can achieve that using only Jinja functions and dbt context variables.
As you have noticed, we can overwrite the dbt built-in macro that handles the schema's name, and luckily, there's a way to access the model's path using the node variable that is defined in the arguments of the macro.
I used the fqn property for this example:
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{# Check if the model does not contain a subfolder (e.g, models created at the MODELS root folder) #}
{% if node.fqn[1:-1]|length == 0 %}
{{ default_schema }}
{% else %}
{# Concat the subfolder(s) name #}
{% set prefix = node.fqn[1:-1]|join('_') %}
{{ prefix | trim }}
{% endif %}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
The fqn property will return a list based on the location of your model where the first position will be the dbt project name and the last position will be your model's name. So based on your example, we'd have the following:
[<project_name>, 'product1', 'team1', 'model1']
If you do a dbt ls --m <model_name> you'll notice that the output is exactly what fqn returns
The node.fqn[1:-1] is the shortest and most Pythonic way to slice a list. So, the command is basically removing the first and last position of the list (project name & model name) leaving only the remaining path of your model.
With that in mind, we have a condition to check if the model doesn't contain a subfolder, because if that's the case, we'll return just the default_schema defined in the profiles.yml. Otherwise, we proceed with the logic to transform the list into a string by using the join Jinja function.
In case you want, it would be good to do a log of the node variable to see all the available options we have for it.

dbt cannot create two resources with identical database representations

I have a situation here as below:
There are two models in my dbt project
model-A
{{ config(
materialized='ephemeral',
alias='A_0001',
schema=var('xxx_yyy_dataset')
) }}
model-B
{{ config(
materialized='ephemeral',
alias='B_0002',
schema=var('xxx_yyy_dataset')
) }}
And these are getting materialized as incremental in same schema as xxx_yyy_dataset.Table_DDD
{{ config(
materialized='incremental',
alias='Table_DDD',
schema=var('xxx_yyy_dataset')
) }}
SELECT * FROM {{ref('A_0001')}}
UNION ALL
SELECT * FROM {{ref('B_0002')}}
This is working fine and it is ingesting records into target table.
Now I have introduced another model - model-C ind different package
model-C
{{ config(
materialized='incremental',
alias='Table_DDD',
schema=var('xxx_yyy_dataset')
) }}
This gives me the following error:
$ dbt compile --profiles-dir=profile --target ide
Running with dbt=0.16.0
Encountered an error:
Compilation Error
dbt found two resources with the database representation "xxx_yyy_dataset.Table_DDD".
dbt cannot create two resources with identical database representations. To fix this,
change the "schema" or "alias" configuration of one of these resources:
- model.eplus_rnc_dbt_project.conrol_outcome_joined (models/controls/payment/fa-join/conrol_outcome_joined.sql)
- model.eplus_rnc_dbt_project.dq_control_outcome_joined (models/controls/dq/dq-join/dq_control_outcome_joined.sql)
I have configured macro for custom macro as below :
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ custom_schema_name }}
{%- endif -%}
{%- endmacro %}
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}
{%- if custom_alias_name is none -%}
{{ node.name }}
{%- else -%}
{{ custom_alias_name | trim }}
enter code here
{%- endif -%}
{%- endmacro %}
dbt is doing its job here!
You have two models that share the exact same configuration — conrol_outcome_joined and dq_control_outcome_joined.
This means that they'll both try to write to the same table: xxx_yyy_dataset.Table_DDD.
dbt is (rightfully) throwing an error here to avoid a problem.
As the error message suggests, you should update one of your models to use a different schema or alias so that it gets represented in your BigQuery project as a separate table.
I had been struggling with the same problem here, I wanted to create a pipeline of tests that would only be written to a single incremental table and it triggers the same error message, but I am afraid it is not possible with DBT.
To resolve it, I created a main model that selects and unions the info from all the individual test models that I created (I previously created a model/table for each test to be applied) and that in the end with the post_hook I just drop the individual tables previously created, thus, I only stick to a final testing table that keeps all the information.
It is not what I really desired since it is not a dynamic implementation, because every test that is created needs to be added to the main table union and also to the drop statement in the post_hook, however if any test breaks individually it would not break all the other tests, neither a bunch of tables exists in my database when I start my work, you just need to orchestrate it at the right time for you.
(Another possible approach could be creating 1 model, where in the pre_hook, you create all the tables that you want, since dbt cannot make models write to the same table, in the "main" part of the model, you select and union the info of all the pre-hook tables, and then in the post-hook you delete the tables created before, not sure if this can work, not tested, but you do reduce the amount of tables written to the Database, which is the main drawback of the 1st approach although for a short period of time)