dbt macro to extract MAX len of column

dbt macro to extract MAX len of column - dbt

I essentially want to pass in a column, then do a calculation that automatically calculates the numbers that I will eventually pass in to my CAST as Decimal(x,x) argument so I am always accounting for the largest Decimal so no rounding occurs.
I want to pass in a target column split on the period, calculate the max len() to the left, and then the max len() to the right, and then return (left+right, right) so it is something like 22,8.
Here is my macro so far:
{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{%- set max_existing_total = load_result('cast_decimal_max').table.columns['max_total'].values()[0] -%}
{{ return(max_existing_total) }}
{%- endmacro %}
I keep getting this error: 'None' has no attribute 'table' and I am not sure what I am doing wrong.
Bonus points if you can tell me how to automatically pass in the current table from my FROM statement so I do not need it as an argument in my macro, instead of doing it like this:
SELECT {{ cast_decimal(close_price, public.stable_prices) }}
FROM public.stable_prices

{% macro cast_decimal(max_field, table_name) %}
{%- call statement('cast_decimal_max', fetch_result=True) -%}
WITH mq AS
(
SELECT MAX(len(split_part({{ max_field }},'.',1))) AS max_l,
MAX(len(split_part({{ max_field }},'.',2))) AS max_r
FROM {{ table_name }}
)
SELECT (max_l + max_r) || ',' || max_r AS max_total
FROM mq
{%- endcall %}
{% if execute %}
{%- set max_existing_total = "'" ~ load_result('cast_decimal_max').table.columns['max_total'].values()[0] ~ "'"-%}
{{ return(max_existing_total) }}
{% else %}
{{ return(false) }}
{% endif %}
{%- endmacro %}
This solution works- Wrapping the results with "'" ~ and adding the if execute statement.

I can't test this since I don't have access to a postgres db (which I assume you're using?). What happens if you put your arguments in quotes in your call in the script? So like this,
SELECT {{ cast_decimal('close_price', 'public.stable_prices') }}
FROM public.stable_prices
(sorry this should have been a comment but can't add comments yet!)

Related

Invalid type for parameter 'TO_GEOGRAPHY'

Why does casting
select cast(st_makepoint(-90.345929, 37.278424) as geography)
raise the following error:
SQL compilation error: invalid type [CAST(ST_MAKEPOINT(TO_DOUBLE(-90.345929), TO_DOUBLE(37.278424)) AS GEOGRAPHY)] for parameter 'TO_GEOGRAPHY'
While a seemingly more direct pass of the st_makepoint result to to_geography does not?
select to_geography(st_makepoint(-90.345929, 37.278424))
I'm fairly sure I'm stuck with the casting behavior in the dbt tool I'm using. Basically I'm trying to union a bunch of tables with this geography field, and in the compiled SQL this casting logic appears as a function of dbt's union_relations macro, and I don't seem to be able to control whether the casting occurs.

The source for union_relations is here.
You can copy this macro into your own project (under the macros directory) and patch the source, and then call it with union_relations instead of dbt_utils.union_relations.
The offending lines are 106-113. Something like this should work fine:
{% for col_name in ordered_column_names -%}
{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
{% if col_type == 'geography' %}
to_geography({{ col_name }}) as {{ col.quoted }}
{% else %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }}
{% endif %}
{%- if not loop.last %},{% endif -%}
{%- endfor %}

Because CAST doesn't support that particular combination of source and target datatypes

With dbt how do I use a model cte in a macro call in a jinja expression?

How would I reference a previously defined cte inside of a macro call, inside of a jinja expression block?
with stg_example_table as (
select *
from {{ ref('stg_db__example_table') }}
where {{ ref('stg_db__example_table') }}.example_column = 'foobar'
),
earliest_date as (
THIS
{{ vvvvvvvvvvvvvvvvv
get_earliest_date('month', 'created_at', stg_example_table)
}} ^^^^^^^^^^^^^^^^^
)
select * from earliest_date
I can't seem to reference the cte, stg_example_table, in that location in a way that works. How should it be referenced in a way that will work?
The macro get_earliest_date(), works if I use ref('stg_db__example_table'). But then I'm getting the wrong value since the table isn't reduced as it would be from the cte.
I could create another stg model that has the filters I need and ref() that one, but it'd be nice to just use the cte here.
I have also tried various forms of:
{% set earliest_date = run_query("select min(created_at)::date from stg_db__example_table").columns[0][0] %}
And then referencing the set earliest_date, but I could not get that to work.
For reference, this is the get_earliest_date() macro:
{% macro get_earliest_date(date_component, column_name, relation) %}
{% set query %}
select
date_trunc({{ date_component }}, min({{ column_name }}))::date as earliest
from {{ relation }}
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set result = results.columns[0][0] %}
{% else %}
{% set result = null %}
{% endif %}
{{ return(result) }}
{% endmacro %}
The example code is simplified, but eventually I want to get a date_spine() with:
{{
dbt_utils.date_spine(
datepart = "month",
start_date = get_earliest_date('month', 'created_at', stg_example_table),
end_date = "date_trunc('month', current_date())"
)
}}

dbt doesn't parse your model, so it simply doesn't know what stg_example_table is.
If you're looking to re-use a CTE, it should probably be its own model (a macro would be another choice). You can use the ephemeral materialization and dbt won't persist anything to your warehouse -- it just interpolates the model as a CTE. There are some limitations for how you can ref an ephemeral model, but I think in this case, since you're calling get_earliest_date from a model, it should work fine.
-- in stg_example_table.sql
{{ config(materialized="ephemeral") }}
select *
from {{ ref('stg_db__example_table') }}
where {{ ref('stg_db__example_table') }}.example_column = 'foobar'
-- in your_model.sql
...
{{
dbt_utils.date_spine(
datepart = "month",
start_date = get_earliest_date('month', 'created_at', ref('stg_example_table')),
end_date = "date_trunc('month', current_date())"
)
}}

DBT macro receives variable as string when dictionary is passed

I'm new to DBT world and facing a strange issue.
Database - Snowflake.
Attached two test models , one test csv data file and one test macro file.
seed data file : sample_data.csv
----------------------------------------------------------------
subdimension,datasource,datasource_label,scoring_metric,weight
facebook_impressions,pathmatics_facebook,Pathmatics Facebook,total_year_impressions,5
facebook_engagement,facebook,Facebook,total_year_interactions,2.5
facebook_engagement,facebook,Facebook,interactions_per_post,2.5
-------------------------------------------------------------------
I'm reading data from the seed csv file using call statement and creating a dictionary using fromjson(query_result).
if you look at TEST1 model , this reading data as dictionary is coded there . This dictionary is then passed to the macro scoring and the macro receives it as a dictionary as it should.
Now take a look at TEST2 model. It's the same thing but the only difference is , the reading data as a dictionary format is done through a macro called get_scoring_metrics .
Here TEST2 receives the dictionary from get_scoring_metrics and pass it on to the macro scoring. But this time the macro scoring receives data as a string instead of a dictionary and throws the exception str object has no attribute items. If you compile the models , you will see it.
How is this possible? Both cases I'm using the same code.
I need data in dictionary format for ease of developing a complicated model .
Any solution will be appreciated . Thank you.
MODEL TEST1 (WORKS CORRECTLY)
-- depends_on: {{ ref('sample_data') }}
{%- set datasource = 'facebook' -%}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{% endif %}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
MODEL TEST2 (Which throws exception during compilation even though I'm using same code as model TEST1)
{%- set datasource = 'facebook' -%}
{%- set metric_dict = get_scoring_metrics(datasource) -%}
{{
scoring(
datasource = datasource,
metric_dict = metric_dict
)
}}
macro scoring
{% macro scoring(datasource, metric_dict={} ) %}
--the query below means nothing. It's just an example to show that when this macro is called from test2.sql model, the metric_dict does not
--work as a dictionary but this macro receives metric_dict as a string (error- str objet has no attribute items).
--But when it's called from test1.sql model it receives metric_dict as a python dictionary.
{% for i , j in metric_dict.items() %}
select {{ i }} , {{ j }}
{% endfor %}
{% endmacro %}
macro get_scoring_metrics
{% macro get_scoring_metrics(datasource) %}
-- DICTIONARY THAT MAPS METRICS TO THEIR SUB-DIMENSIONS
{%- call statement('scoring_metric_query', fetch_result=True) -%}
SELECT TO_JSON(PARSE_JSON('{'||LISTAGG(diq_subdim,',')||'}')) AS diq_metric
FROM
(
SELECT subdim_name||':'||'['||scoring_metric||']' AS diq_subdim
FROM
(
SELECT '"'||subdimension ||'"' AS subdim_name,
LISTAGG(''''||scoring_metric ||'''' , ',') AS scoring_metric
FROM {{ ref('sample_data') }}
WHERE lower(datasource) = lower('{{ datasource }}')
GROUP BY subdimension
) T1
) T2
{%- endcall -%}
{% if execute %}
{%- set query_result = load_result('scoring_metric_query')['data'][0][0] -%}
{% set metric_dict = fromjson(query_result) %}
{{ return(metric_dict) }}
{% endif %}
{% endmacro %}
-----------------------------------------------------------------

Macro to surface models to other schemas - dbt_utils.star()

Problem
Currently in my CI process, I am surfacing specific models built to multiple schemas. This is generally my current process.
macros/surface_models.sql
{% set model_views = [] %}
{% for node in graph.nodes.values() %}
{% if some type of filtering criteria %}
{%- do model_tables.append( graph.node.alias ) -%}
{% endif %}
{% endfor %}
{% for view in model_views %}
{% set query %}
'create view my_other_schema.' ~ table ~ 'as (select * from initial_schema.' ~ table ~ ');'
{% endset %}
{{ run_query(query) }}
{% endfor %}
while this works, if the underlying table/view's definition changes, the view created from the above macro will return an error like: QUERY EXPECTED X COLUMNS BUT GOT Y
I could fix this by writing each query with each query's explicit names:
select id, updated_at from table
not
select * from table
Question
Is there a way to utilize the above macro concept but using {{ dbt_utils.star() }} instead of *?

Concatenate columns using a macro in DBT for Redshift

I want to concatenate a few columns using column1 ^^ column2 ^^ ... syntax in DBT for Redshift. If there are NULL values in the columns ## should be used, resulting in f.e. ## ^^ ##. I have found the following macro for concatenation:
{% macro safe_concat(field_list) %}
{# Takes an input list and generates a concat() statement with each argument in the list safe_casted to a string and wrapped in an ifnull() #}
concat({% for f in field_list %}
ifnull(safe_cast({{ f }} as string), '##')
{% if not loop.last %}, {% endif %}
{% endfor %})
{% endmacro %}
When I use it in my select statement:
select
{{ safe_concat([street, city]) }} as address_key
from source
I get the following error. Is this related to the code I am using?
Database Error in model address (models/address.sql)
syntax error at or near "as"
LINE 32: ifnull(safe_cast( as string), '##')

Try wrapping your column names in quotes when you call them in the macro - I think it’s trying to pass in the variables street and city (because you’re already inside of curly braces), which don’t exist so are evaluating to None

you can try pushing every loop into an array and then you can use evaluated strings.and also for concat func. you can use '~' this.
{% set query_results = [] %}
{% for f in field_list %}
{% set x = ifnull(safe_cast({{ f }} as string), '##') ~ '^^' %}
{% if not loop.last %}, {% endif %}
{% set query_results = query_results.append(x) %}
{% endfor %}
...
return{{query_results }}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

dbt macro to extract MAX len of column - dbt

Related

Invalid type for parameter 'TO_GEOGRAPHY'

With dbt how do I use a model cte in a macro call in a jinja expression?

DBT macro receives variable as string when dictionary is passed

Macro to surface models to other schemas - dbt_utils.star()

Concatenate columns using a macro in DBT for Redshift

Categories

Resources