DBT macro receives variable as string when dictionary is passed - dbt

I'm new to DBT world and facing a strange issue.
Database - Snowflake.
Attached two test models , one test csv data file and one test macro file.
seed data file : sample_data.csv
----------------------------------------------------------------
subdimension,datasource,datasource_label,scoring_metric,weight
facebook_impressions,pathmatics_facebook,Pathmatics Facebook,total_year_impressions,5
facebook_engagement,facebook,Facebook,total_year_interactions,2.5
facebook_engagement,facebook,Facebook,interactions_per_post,2.5
-------------------------------------------------------------------
I'm reading data from the seed csv file using call statement and creating a dictionary using fromjson(query_result).
if you look at TEST1 model , this reading data as dictionary is coded there . This dictionary is then passed to the macro scoring and the macro receives it as a dictionary as it should.
Now take a look at TEST2 model. It's the same thing but the only difference is , the reading data as a dictionary format is done through a macro called get_scoring_metrics .
Here TEST2 receives the dictionary from get_scoring_metrics and pass it on to the macro scoring. But this time the macro scoring receives data as a string instead of a dictionary and throws the exception str object has no attribute items. If you compile the models , you will see it.
How is this possible? Both cases I'm using the same code.
I need data in dictionary format for ease of developing a complicated model .
Any solution will be appreciated . Thank you.
MODEL TEST1 (WORKS CORRECTLY)
-- depends_on: {{ ref('sample_data') }}
{%- set datasource = 'facebook' -%}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{% endif %}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
MODEL TEST2 (Which throws exception during compilation even though I'm using same code as model TEST1)
{%- set datasource = 'facebook' -%}
{%- set metric_dict = get_scoring_metrics(datasource) -%}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
macro scoring
{% macro scoring(datasource, metric_dict={} ) %}
--the query below means nothing. It's just an example to show that when this macro is called from test2.sql model, the metric_dict does not
--work as a dictionary but this macro receives metric_dict as a string (error- str objet has no attribute items).
--But when it's called from test1.sql model it receives metric_dict as a python dictionary.
{% for i , j in metric_dict.items() %}
select {{ i }} , {{ j }}
{% endfor %}
{% endmacro %}
macro get_scoring_metrics
{% macro get_scoring_metrics(datasource) %}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{{ return(metric_dict) }}
{% endif %}
{% endmacro %}
-----------------------------------------------------------------

Related

Invalid type for parameter 'TO_GEOGRAPHY'

Why does casting
select cast(st_makepoint(-90.345929, 37.278424) as geography)
raise the following error:
SQL compilation error: invalid type [CAST(ST_MAKEPOINT(TO_DOUBLE(-90.345929), TO_DOUBLE(37.278424)) AS GEOGRAPHY)] for parameter 'TO_GEOGRAPHY'
While a seemingly more direct pass of the st_makepoint result to to_geography does not?
select to_geography(st_makepoint(-90.345929, 37.278424))
I'm fairly sure I'm stuck with the casting behavior in the dbt tool I'm using. Basically I'm trying to union a bunch of tables with this geography field, and in the compiled SQL this casting logic appears as a function of dbt's union_relations macro, and I don't seem to be able to control whether the casting occurs.
The source for union_relations is here.
You can copy this macro into your own project (under the macros directory) and patch the source, and then call it with union_relations instead of dbt_utils.union_relations.
The offending lines are 106-113. Something like this should work fine:
{% for col_name in ordered_column_names -%}
{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
{% if col_type == 'geography' %}
to_geography({{ col_name }}) as {{ col.quoted }}
{% else %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }}
{% endif %}
{%- if not loop.last %},{% endif -%}
{%- endfor %}
Because CAST doesn't support that particular combination of source and target datatypes

DBT set variable using macros

my goal is to get the last 2 dates from the tables and run insert_overwrite to load incremental on a large table. I am trying to set a variable inside the model by calling on the macros I wrote. The SQL query is in BigQuery.
I get an error message.
'None' has no attribute 'table'
inside model
{% set dates = get_last_two_dates('window_start',source('raw.event','tmp')) %}
macros
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% set max_value = run_query(query).columns[0][0] %}
{% do return(max_value) %}
{% endmacro %}
Thanks in advance. let me know if you have any other questions.
You probably need to wrap {% set max_value ... %} with an {% if execute %} block:
{% macro get_last_two_dates(target_column_name, target_table = this) %}
{% set query %}
select string_agg(format('%T',target_date),',') target_date_string
from (
SELECT distinct date({{ target_column_name }}) target_date
FROM {{ target_table }}
order by 1 desc
LIMIT 2
) a
{% endset %}
{% if execute %}
{% set max_value = run_query(query).columns[0][0] %}
{% else %}
{% set max_value = "" %}
{% endif %}
{% do return(max_value) %}
{% endmacro %}
The reason for this is that your macro actually gets run twice -- once when dbt is scanning all of the models to build the DAG, and a second time when the model is actually run. execute is only true for this second pass.

dbt macro to extract MAX len of column

I essentially want to pass in a column, then do a calculation that automatically calculates the numbers that I will eventually pass in to my CAST as Decimal(x,x) argument so I am always accounting for the largest Decimal so no rounding occurs.
I want to pass in a target column split on the period, calculate the max len() to the left, and then the max len() to the right, and then return (left+right, right) so it is something like 22,8.
Here is my macro so far:
{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{%- set max_existing_total = load_result('cast_decimal_max').table.columns['max_total'].values()[0] -%}
{{ return(max_existing_total) }}
{%- endmacro %}
I keep getting this error: 'None' has no attribute 'table' and I am not sure what I am doing wrong.
Bonus points if you can tell me how to automatically pass in the current table from my FROM statement so I do not need it as an argument in my macro, instead of doing it like this:
SELECT {{ cast_decimal(close_price, public.stable_prices) }}
FROM public.stable_prices
{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{% if execute %}
{%- set max_existing_total = "'" ~ load_result('cast_decimal_max').table.columns['max_total'].values()[0] ~ "'"-%}
{{ return(max_existing_total) }}
{% else %}
{{ return(false) }}
{% endif %}
{%- endmacro %}
This solution works- Wrapping the results with "'" ~ and adding the if execute statement.
I can't test this since I don't have access to a postgres db (which I assume you're using?). What happens if you put your arguments in quotes in your call in the script? So like this,
SELECT {{ cast_decimal('close_price', 'public.stable_prices') }}
FROM public.stable_prices
(sorry this should have been a comment but can't add comments yet!)

Fixing the error - cannot unpack non-iterable NoneType object in a DBT macro using Jinja

I am writing a macro that uses Snowflake to offload a prebuilt table to an s3 bucket. Normally, we just have one table but in this instance I have 5 or 6 tables to unload into the same s3 bucket. I was writing a for loop to iterate through a list of a dictionary with the file being the name of the file being written to s3 and the table being the table to unload in Snowflake. The code works, but I keep getting an error after the unloading of cannot unpack non-iterable NoneType object which makes me think that the loop is trying to run one last time.
The code I have is as follows:
{% macro send_anaplan_data_to_s3() %}
{{ log('Running Export to S3 Macro ...', info = true) }}
{% set table_names=[{"file":" '/file_name.csv' ", "table":"DATABSE.SCHEMA.TABLE"}] %}
{% for name in table_names %}
{{ log(name.file, info = true) }}
{{ log(name.table, info = true) }}
-----first table-----
-- this block makes the s3 path with dated filename
{%- call statement('send_s3_statement', fetch_result=True) -%}
select concat('s3://data-anaplan',{{ name.file }})
{%- endcall -%}
-- first compiles then executes against db
-- so we need if/else otherwise it will fail on
-- compile when accessing .data[0]
{%- if execute -%}
{%- set result = load_result('send_s3_statement').data[0] -%}
{%- else -%}
{%- set result = [] -%}
{%- endif -%}
-- spot check the resulting filename.
{{ log('resulting filename:', info=True )}}
{{ log(result, info=True )}}
-- send the data to the correct location in S3
{% for r in result -%}
{{ log(r, info=true) }}
{{ log('Unloading to s3', info = true) }}
{%- call statement(auto_begin=true) -%}
COPY INTO '{{r}}' from {{ name.table }}
STORAGE_INTEGRATION = S3_SNOWFLAKE_ANAPLAN_INTEGRATION
SINGLE = TRUE
MAX_FILE_SIZE = 4900000000
OVERWRITE = TRUE
FILE_FORMAT = (TYPE = CSV, FIELD_DELIMITER = ',', FIELD_OPTIONALLY_ENCLOSED_BY = '"', COMPRESSION = NONE, NULL_IF=())
HEADER = TRUE
{%- endcall -%}
{%- if not loop.last -%} , {%- endif %}
{% endfor -%}
-- confirm when done
{{log('Finished.', info=True)}}
{% endfor %}
{% endmacro %}
Any ideas here? Thank you!
This error implies that one of the queries you pass to the db is empty/null. Check the db logs to see where that's happening.

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?