With dbt how do I use a model cte in a macro call in a jinja expression? - sql

How would I reference a previously defined cte inside of a macro call, inside of a jinja expression block?
with stg_example_table as (
select *
from {{ ref('stg_db__example_table') }}
where {{ ref('stg_db__example_table') }}.example_column = 'foobar'
),
earliest_date as (
THIS
{{ vvvvvvvvvvvvvvvvv
get_earliest_date('month', 'created_at', stg_example_table)
}} ^^^^^^^^^^^^^^^^^
)
select * from earliest_date
I can't seem to reference the cte, stg_example_table, in that location in a way that works. How should it be referenced in a way that will work?
The macro get_earliest_date(), works if I use ref('stg_db__example_table'). But then I'm getting the wrong value since the table isn't reduced as it would be from the cte.
I could create another stg model that has the filters I need and ref() that one, but it'd be nice to just use the cte here.
I have also tried various forms of:
{% set earliest_date = run_query("select min(created_at)::date from stg_db__example_table").columns[0][0] %}
And then referencing the set earliest_date, but I could not get that to work.
For reference, this is the get_earliest_date() macro:
{% macro get_earliest_date(date_component, column_name, relation) %}
{% set query %}
select
date_trunc({{ date_component }}, min({{ column_name }}))::date as earliest
from {{ relation }}
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set result = results.columns[0][0] %}
{% else %}
{% set result = null %}
{% endif %}
{{ return(result) }}
{% endmacro %}
The example code is simplified, but eventually I want to get a date_spine() with:
{{
dbt_utils.date_spine(
datepart = "month",
start_date = get_earliest_date('month', 'created_at', stg_example_table),
end_date = "date_trunc('month', current_date())"
)
}}

dbt doesn't parse your model, so it simply doesn't know what stg_example_table is.
If you're looking to re-use a CTE, it should probably be its own model (a macro would be another choice). You can use the ephemeral materialization and dbt won't persist anything to your warehouse -- it just interpolates the model as a CTE. There are some limitations for how you can ref an ephemeral model, but I think in this case, since you're calling get_earliest_date from a model, it should work fine.
-- in stg_example_table.sql
{{ config(materialized="ephemeral") }}
select *
from {{ ref('stg_db__example_table') }}
where {{ ref('stg_db__example_table') }}.example_column = 'foobar'
-- in your_model.sql
...
{{
dbt_utils.date_spine(
datepart = "month",
start_date = get_earliest_date('month', 'created_at', ref('stg_example_table')),
end_date = "date_trunc('month', current_date())"
)
}}

Related

Invalid type for parameter 'TO_GEOGRAPHY'

Why does casting
select cast(st_makepoint(-90.345929, 37.278424) as geography)
raise the following error:
SQL compilation error: invalid type [CAST(ST_MAKEPOINT(TO_DOUBLE(-90.345929), TO_DOUBLE(37.278424)) AS GEOGRAPHY)] for parameter 'TO_GEOGRAPHY'
While a seemingly more direct pass of the st_makepoint result to to_geography does not?
select to_geography(st_makepoint(-90.345929, 37.278424))
I'm fairly sure I'm stuck with the casting behavior in the dbt tool I'm using. Basically I'm trying to union a bunch of tables with this geography field, and in the compiled SQL this casting logic appears as a function of dbt's union_relations macro, and I don't seem to be able to control whether the casting occurs.
The source for union_relations is here.
You can copy this macro into your own project (under the macros directory) and patch the source, and then call it with union_relations instead of dbt_utils.union_relations.
The offending lines are 106-113. Something like this should work fine:
{% for col_name in ordered_column_names -%}
{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
{% if col_type == 'geography' %}
to_geography({{ col_name }}) as {{ col.quoted }}
{% else %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }}
{% endif %}
{%- if not loop.last %},{% endif -%}
{%- endfor %}
Because CAST doesn't support that particular combination of source and target datatypes

How do I loop through alll columns using Jinja in DBT?

I want to iterate over all the columns using dbt.
You can use the built-in adapter wrapper and adapter.get_columns_in_relation:
{% for col in adapter.get_columns_in_relation(ref('<<your model>>')) -%}
... {{ col.column }} ...
{% endfor %}
I think the star macro from the dbt-utils package + some for-loop logic might help you here? This depends on the exact use case and warehouse you're using (as pointed out in the comments).
The star macro generates a list of columns in the table provided.
So a possible approach would be something along the lines of:
{% for col in [{{ dbt_utils.star(ref('my_model')) }}] %}
...operation...
{% endfor %}
If you have the model node, and you have columns defined as model properties, this will work:
{% for col in model.columns.values() %}
... {{ col.name }} ... {{ col.data_type }} ...
{% endfor %}
You can get the model node from the graph:
{% set model = graph.nodes.values()
| selectattr("resource_type", "equalto", "model")
| selectattr("name", "equalto", model_name)
| first %}

dbt macro to extract MAX len of column

I essentially want to pass in a column, then do a calculation that automatically calculates the numbers that I will eventually pass in to my CAST as Decimal(x,x) argument so I am always accounting for the largest Decimal so no rounding occurs.
I want to pass in a target column split on the period, calculate the max len() to the left, and then the max len() to the right, and then return (left+right, right) so it is something like 22,8.
Here is my macro so far:
{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{%- set max_existing_total = load_result('cast_decimal_max').table.columns['max_total'].values()[0] -%}
{{ return(max_existing_total) }}
{%- endmacro %}
I keep getting this error: 'None' has no attribute 'table' and I am not sure what I am doing wrong.
Bonus points if you can tell me how to automatically pass in the current table from my FROM statement so I do not need it as an argument in my macro, instead of doing it like this:
SELECT {{ cast_decimal(close_price, public.stable_prices) }}
FROM public.stable_prices
{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{% if execute %}
{%- set max_existing_total = "'" ~ load_result('cast_decimal_max').table.columns['max_total'].values()[0] ~ "'"-%}
{{ return(max_existing_total) }}
{% else %}
{{ return(false) }}
{% endif %}
{%- endmacro %}
This solution works- Wrapping the results with "'" ~ and adding the if execute statement.
I can't test this since I don't have access to a postgres db (which I assume you're using?). What happens if you put your arguments in quotes in your call in the script? So like this,
SELECT {{ cast_decimal('close_price', 'public.stable_prices') }}
FROM public.stable_prices
(sorry this should have been a comment but can't add comments yet!)

DBT macro receives variable as string when dictionary is passed

I'm new to DBT world and facing a strange issue.
Database - Snowflake.
Attached two test models , one test csv data file and one test macro file.
seed data file : sample_data.csv
----------------------------------------------------------------
subdimension,datasource,datasource_label,scoring_metric,weight
facebook_impressions,pathmatics_facebook,Pathmatics Facebook,total_year_impressions,5
facebook_engagement,facebook,Facebook,total_year_interactions,2.5
facebook_engagement,facebook,Facebook,interactions_per_post,2.5
-------------------------------------------------------------------
I'm reading data from the seed csv file using call statement and creating a dictionary using fromjson(query_result).
if you look at TEST1 model , this reading data as dictionary is coded there . This dictionary is then passed to the macro scoring and the macro receives it as a dictionary as it should.
Now take a look at TEST2 model. It's the same thing but the only difference is , the reading data as a dictionary format is done through a macro called get_scoring_metrics .
Here TEST2 receives the dictionary from get_scoring_metrics and pass it on to the macro scoring. But this time the macro scoring receives data as a string instead of a dictionary and throws the exception str object has no attribute items. If you compile the models , you will see it.
How is this possible? Both cases I'm using the same code.
I need data in dictionary format for ease of developing a complicated model .
Any solution will be appreciated . Thank you.
MODEL TEST1 (WORKS CORRECTLY)
-- depends_on: {{ ref('sample_data') }}
{%- set datasource = 'facebook' -%}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{% endif %}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
MODEL TEST2 (Which throws exception during compilation even though I'm using same code as model TEST1)
{%- set datasource = 'facebook' -%}
{%- set metric_dict = get_scoring_metrics(datasource) -%}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
macro scoring
{% macro scoring(datasource, metric_dict={} ) %}
--the query below means nothing. It's just an example to show that when this macro is called from test2.sql model, the metric_dict does not
--work as a dictionary but this macro receives metric_dict as a string (error- str objet has no attribute items).
--But when it's called from test1.sql model it receives metric_dict as a python dictionary.
{% for i , j in metric_dict.items() %}
select {{ i }} , {{ j }}
{% endfor %}
{% endmacro %}
macro get_scoring_metrics
{% macro get_scoring_metrics(datasource) %}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{{ return(metric_dict) }}
{% endif %}
{% endmacro %}
-----------------------------------------------------------------

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?