dbt macros -- when calling the macros inside the models marts getting error - dbt

I have created a macro to update a table from join
1. under macros folder i have created a update_table.sql file with below code
{% macro update_table %}
{% macro update_table_ATTRIBUTE6Desc %}
UPDATE CR
SET ATTRIBUTE6Desc = A.Description
from {{'table_A'}} CR
inner join {{'Table_B'}} A
on CR.ATTRIBUTE6 = A.CostCenter
{% endmacro %}
Then calling this macro in Marts model where want update the table columns
here is the file under marts model
with dbt_A
as
(
select *
from {{ Table_A)}}
)
select * from dbt_A
{% Call update_table_ATTRIBUTE6Desc % }
the error message im geting is "('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse error at line: 16, column: 1: Incorrect syntax near 'select'. (103010) (SQLExecDirectW)")r 'select'. (103010) (SQLExecDirectW)")"

Related

dbt get value from agate.Row to string

I want to run a macro in a COPY INTO statement to S3 bucket. Apparently in snowflake I can't do dynamic path. So I'm doing a hacky way to solve this.
{% macro unload_snowflake_to_s3() %}
{# Get all tables and views from the information schema. #}
{%- set query -%}
select concat('COPY INTO #MY_STAGE/year=', year(current_date()), '/my_file FROM (SELECT OBJECT_CONSTRUCT(*) from my_table)');
{%- endset -%}
-- {%- set final_query = run_query(query) -%}
-- {{ dbt_utils.log_info(final_query) }}
-- {{ dbt_utils.log_info(final_query.rows.values()[0]) }}
{%- do run_query(final_query.columns.values()[0]) -%}
-- {% do final_query.print_table() %}
{% endmacro %}
Based on above macros, what I'm trying to do is:
Use CONCAT to add year in the bucket path. Hence, the query becomes a string.
Use the concatenated query to do run_query()again to actually run the COPY INTO statement.
Output and error I got from dbt log:
09:06:08 09:06:08 + | column | data_type |
| ----------------------------------------------------------------------------------------------------------- | --------- |
| COPY INTO #MY_STAGE/year=', year(current_date()), '/my_file FROM (SELECT OBJECT_CONSTRUCT(*) from my_table) | Text |
09:06:08 09:06:08 + <agate.Row: ('COPY INTO #MY_STAGE/year=2022/my_file FROM (SELECT OBJECT_CONSTRUCT(*) from my_table)')>
09:06:09 Encountered an error while running operation: Database Error
001003 (42000): SQL compilation error:
syntax error line 1 at position 0 unexpected '<'.
root#2c50ba8af043:/dbt#
I think the error is that I didn't extract the row and column specifically which is in agate format. How can I convert/extract this to string?
You might have better luck with dbt_utils.get_query_results_as_dict.
But you don't need to use your database to construct that path. The jinja context has a run_started_at variable that is a Python datetime object, so you can build your string in jinja, without hitting the database:
{% set yr = run_started_at.strftime("%Y") %}
{% set query = 'COPY INTO #MY_STAGE/year=' ~ yr ~ '/my_file FROM (SELECT OBJECT_CONSTRUCT(*) from my_table)' %}
Finally, depending on how you're calling this macro you probably want to gate this whole thing with an {% if execute %} flag, so dbt doesn't do the COPY when it's parsing your models.
You can use dbt_utils.get_query_results_as_dict function to get rid of agate part. Maybe after that your copy statement can work.
{%- set final_query = dbt_utils.get_query_results_as_dict(query) -%}
{{log(final_query ,true)}}
{% for keys,val in final_query.items() %}
{{log(keys,true)}}
{{log( val ,true)}}
{% endfor %}
if you run like this you will see ('COPY INTO #MY_STAGE/year=', year(current_date())...') and lastly remove "('')" by
{%- set final_val=val | replace('(', '')| replace(')', '') | replace("'", '') -%}```
That's it.

DBT macro for repetitive task

I am a beginner in DBT. I have a requirement where I have created an Incremental model like below. I need to execute the same Incremental model logic statements for different systems.
There are 3 variables or parameters that I need to pass. i.e. for each run the ATTRIBUTE_NAME, VIEW_NAME, SYSTEM_NAME will need to be passed. For the next run, all the 3 parameters would be different.
However, for a particular SYSTEM_NAME, the VIEW_NAME and ATTRIBUTE_NAME are fixed.
Please help me to execute the dbt run using a macro for this requirement and pass the different system names and their corresponding view names and attribute names. Objective is to use single dbt run statement and execute this model for all ATTRIBUTE_NAME, VIEW_NAME, SYSTEM_NAME.
For now, I have defined variable and execute each run separately for each systems like below in CLI
e.g.
dbt run --vars '{"VIEW_NAME": CCC, "SYSTEM_NAME": BBBB, "ATTRIBUTE_NAME": AAAA}' -m incremental_modelname
dbt run --vars '{"VIEW_NAME": DDD, "SYSTEM_NAME": FFF, "ATTRIBUTE_NAME": HHH}' -m incremental_modelname
dbt run --vars '{"VIEW_NAME": EEE, "SYSTEM_NAME": GGG, "ATTRIBUTE_NAME": III}' -m incremental_modelname
Re-usuable Incremental model:
{{
config(
materialized='incremental',
transient=false,
unique_key='composite_key',
post_hook="insert into table (col1, col2, col3)
select
'{{ var('ATTRIBUTE_NAME') }}',
col2,
col3
from {{ this }} a
join table b on a=b
where b.SYSTEM_NAME='{{ var('SYSTEM_NAME') }}';
commit;"
)
}}
with name1 AS (
select
*
from {{ var('VIEW_NAME') }}
),
select
*
from name1
{% if is_incremental() %}
where (select timestamp_column from {{ var('VIEW_NAME') }}) >
(select max(timestamp_column) from {{ this }} where SYSTEM_NAME='{{ var("SYSTEM_NAME") }}')
{% endif %}
The easiest way would be to:
Create a model(or even a seed) that holds the system name, view name and attribute name.
Within your code, add a for loop
{% set query %}
select system_name, view_name, attribute_name from model_name
{% endset %}
{% set results = run_query(query) %}
{% for result in results %}
/*
Put your query here but reference the columns needed
results.columns[0].values() = system_name
results.columns[1].values() = view_name
results.columns[2].values() = attribute_name
*/

dbt if/else macros return nothing

I'm trying to use a dbt macro to transform survey results.
I have a table similar to:
column1
column2
often
sometimes
never
always
...
...
I want to transform it into:
column 1
column 2
3
2
1
4
...
...
using the following mapping:
category
value
always
4
often
3
sometimes
2
never
1
To do so I have written the following sbt macro:
{% macro class_to_score(class) %}
{% if class == "always" %}
{% set result = 1 %}
{% elif class == "often" %}
{% set result = 2 %}
{% elif class == "sometimes" %}
{% set result = 3 %}
{% elif class == "never" %}
{% set result = 4 %}
{% endif -%}
{{ return(result) }}
{% endmacro %}
and then the following sql query:
{%- set class_to_score = class_to_score -%}
select
{{ set_class_to_score(column1) }} as column1_score,
from
table
However, I get the error:
Syntax error: SELECT list must not be empty at [5:1]
Anyone know why I am not getting anything back?
Thanks for the time you took to communicate your question. It's not easy! It looks like you're experiencing the number one misconception when it comes to dbt and Jinja:
Jinja isn't about transforming data, it's about composing a single SQL query that will be sent to the database. After everything inside jinja's curly brackets is rendered, you will be left with a query that can be sent to the database.
This notion does get complicated with dbt macros like run_query (docs) which will go to the database and get information. But the info you fetch can only used to generate the SQL string.
Your example sounds like the way to do things if you're using Python's pandas where the transformations happens in memory. In dbt-land, only the string generation happens in memory, though sometimes we get some of the substrings from the database before making the new query. So it sounds like you'd like Jinja to look at every value in the column and make the substitution, what you really need to do be doing is make generate a query that instructs the database to make the substitution. The way we do substitution in SQL is with CASE WHEN statements (see Mode's CASE docs for more info)
This is probably closer to what you want. Note it's probably better to make the likert_map object into a dbt seed table.
{% set likert_map =
{"1": "always", "2": "often", "3": "sometimes", "4": "never"} %}
SELECT
CASE column_1
{% for key, value in likert_map.items() %}
WHEN '{{ value }}' THEN '{{ key }}'
{% endfor %}
ELSE 0 END AS column_1_new,
CASE column_2
{% for key, value in likert_map.items() %}
WHEN '{{ value }}' THEN '{{ key }}'
{% endfor %}
ELSE 0 END AS column_2_new
{% endfor %}
FROM
table
Here's some related questions using mapping dictionary information to make a SQL query:
How to join two tables into a dictionary in dbt jinja
DBT - for loop issue with number as variable

Assign value of a column to variable in sql use jinja template language

I have a sql file like this to transform a table has a column include a json string
{{ config(materialized='table') }}
with customer_orders as (
select
time,
data as jsonData,
{% set my_dict = fromjson( jsonData ) %}
{% do log("Printout: " ~ my_dict, info=true) %}
from `warehouses.raw_data.customer_orders`
limit 5
)
select *
from customer_orders
When I run dbt run, it return like this:
Running with dbt=0.21.0
Encountered an error:
the JSON object must be str, bytes or bytearray, not Undefined
I even can not print out the value of column I want:
{{ config(materialized='table') }}
with customer_orders as (
select
time,
tag,
data as jsonData,
{% do log("Printout: " ~ data, info=true) %}
from `warehouses.raw_data.customer_orders`
limit 5
)
select *
from customer_orders
22:42:58 | Concurrency: 1 threads (target='dev')
22:42:58 |
Printout:
22:42:58 | Done.
But if I create another model to printout the values of jsonData:
{%- set payment_methods = dbt_utils.get_column_values(
table=ref('customer_orders_model'),
column='jsonData'
) -%}
{% do log(payment_methods, info=true) %}
{% for json in payment_methods %}
{% set my_dict = fromjson(json) %}
{% do log(my_dict, info=true) %}
{% endfor %}
It print out the json value I want
Running with dbt=0.21.0
This is log
Found 2 models, 0 tests, 0 snapshots, 0 analyses, 372 macros, 0 operations, 0 seed files, 0 sources, 0 exposures
21:41:15 | Concurrency: 1 threads (target='dev')
21:41:15 |
['{"log": "ok", "path": "/var/log/containers/...log", "time": "2021-10-26T08:50:52.412932061Z", "offset": 527, "stream": "stdout", "#timestamp": 1635238252.412932}']
{'log': 'ok', 'path': '/var/log/containers/...log', 'time': '2021-10-26T08:50:52.412932061Z', 'offset': 527, 'stream': 'stdout', '#timestamp': 1635238252.412932}
21:41:21 | Done.
But I want to process this jsonData with in a model file like customer_orders_model above.
How can I get value of a column and assign it to a variable and continue to process whatever I want (check if in json have a key I want and set it value to new column).
Notes: My purpose is that: In my table, has a json string column, I want extract this json string column into many columns so I can easily write sql query what I want.
In case BigQuery database, Google has a JSON functions in Standard SQL
If your column is JSON string, I think you can use JSON_EXTRACT to get value of the key you want
EX:
with customer_orders as (
select
time,
tag,
data as jsonData,
json_extract(data, '$.log') AS log,
from `dc-warehouses.raw_data.logs_trackfoe_prod`
limit 5
)
select *
from customer_orders
You are very close! The thing to remember is that dbt and jinja is primarily for rendering text. Anything that isn't in curly brackets is just text strings.
So in your first example, data and jsonData are a substring of the larger query (that is also a string). So they aren't variables that Jinja knows about, which explains the error message that they are Undefined
with customer_orders as (
select
time,
data as jsonData,
from `warehouses.raw_data.customer_orders`
limit 5
)
select *
from customer_orders
This is why dbt_utils.get_column_values() works for you because that macro actually runs a query to get the data and your assigning the result to a variable. the run_query macro can be helpful for situations like this (and i'm fairly certain get_column_values uses run_query in the background).
In regards to your original question, you want to turn a JSON dict into multiple columns, I'd first recommend having your db do this directly. Many dbs have functions that let you do this. Primarily jinja is for generating SQL queries dynamically, not for manipulating data. Even if you could load all the JSON into jinja, I don't know how you'd write that back into a table without using something like a INSERT INTO VALUES statement which, IMHO, goes against the design principle of dbt.

Retrieving table name from snowflake information_schema using dbt

I have created a macro to returns a table name from the INFORMATION_SCHEMA in Snowflake.
I have tables in snowflake as follows
------------
| TABLES |
------------
| ~one |
| ~two |
| ~three |
------------
I want to pass the table type i.e. one into the macro and get the actual table name i.e. ~one
Here is my macro(get_table.sql) in DBT that takes in parameter and returns the table name
{%- macro get_table(table_type) -%}
{%- set table_result -%}
select distinct TABLE_NAME from "DEMO_DB"."INFORMATION_SCHEMA"."TABLES" where TABLE_NAME like '\~%{{table_type}}%'
{%- endset -%}
{%- set table_name = run_query(table_result).columns[0].values() -%}
{{ return(table_name) }}
{%- endmacro -%}
Here is my DBT Model that calls the above macro
{{ config(materialized='table',full_refresh=true) }}
select * from {{get_table("one")}}
But I am getting an error:
Compilation Error in model
'None' has no attribute 'table'
> in macro get_table (macros\get_table.sql)
I don't understand where the error is
You need to use the execute context variable to prevent this error, as it is described here:
https://discourse.getdbt.com/t/help-with-call-statement-error-none-has-no-attribute-table/602
You also be careful about your query, that the table names are uppercase. So you better use "ilike" instead of "like".
Another important point is, "run_query(table_result).columns[0].values()" returns an array, so I added index to the end.
So here's the modified version of your module, which I successfully run it on my test environment:
{% macro get_table(table_name) %}
{% set table_query %}
select distinct TABLE_NAME from "DEMO_DB"."INFORMATION_SCHEMA"."TABLES" where TABLE_NAME ilike '%{{ table_name }}%'
{% endset %}
{% if execute %}
{%- set result = run_query(table_query).columns[0].values()[0] -%}
{{return( result )}}
{% else %}
{{return( false ) }}
{% endif %}
{% endmacro %}