dbt (data build tool) jinja modules - 'dict object' has no attritute 're' - dbt

According to DBT's docs on modules to use within jinja functions - https://docs.getdbt.com/reference/dbt-jinja-functions/modules - modules.re should be available. However, there is this macro I am working with:
{% macro camel_to_snake_case(camel_case_string) -%}
{{ modules.re.sub('([A-Z][a-z]|[A-Z]*[0-9]+)', '_\\1', modules.re.sub('([A-Z]+[A-Z]([a-z]|$))', '_\\1', camel_case_string)) | trim('_') | lower() }}
{%- endmacro %}
and whenever a script is run that uses this macro, i receive the error:
Running with dbt=0.17.0
Encountered an error:
Compilation Error in model model_using_macro (models/model_using_macro.sql)
'dict object' has no attribute 're'
Do I need to install something in order to access the modules.re function? Maybe the base dbt I have installed doesn't have this modules at all? Perhaps there is a way I can check the output for modules to see why re is missing, and what else might be available / missing? I'm not sure why else this error could be happening?

Try upgrading dbt, re was added in 0.19.0 (source)

Related

How to specify model schema when referencing another dbt project as a package? (dbt multi-repo setup)

We're using a dbt multi-repo setup with different projects for different business areas. We have several projects, something like this:
dbt_dwh
dbt_project1
dbt_project2
The dbt_dwh project contains models which we plan to reference in projects 1 and 2 (we have ~10 projects that would reference the dbt_dwh project) by way of installing git packages. Ideally, we'd like to be able to just reference the models in the dbt_dwh project (e.g.
SELECT * FROM {{ ref('dbt_dwh', 'model_1') }}). However, each of our projects sits in it's own database schema and this causes issue upon dbt run because dbt uses the target schema from dbt_project_x, where these objects don't exist. I've included example set-up info below, for clarity.
packages.yml file for dbt_project1:
packages:
- git: https://git/repo/url/here/dbt_dwh.git
revision: master
profiles.yml for dbt_dwh:
dbt_dwh:
target: dwh_dev
outputs:
dwh_dev:
<config rows here>
dwh_prod:
<config rows here>
profiles.yml for dbt_project1:
dbt_project1:
target: project1_dev
outputs:
project1_dev:
<config rows here>
project1_prod:
<config rows here>
sf_orders.sql in dbt_dwh:
{{
config(
materialized = 'table',
alias = 'sf_orders'
)
}}
SELECT * FROM {{ source('salesforce', 'orders') }} WHERE uid IS NOT NULL
revenue_model1.sql in dbt_project1:
{{
config(
materialized = 'table',
alias = 'revenue_model1'
)
}}
SELECT * FROM {{ ref('dbt_dwh', 'sf_orders') }}
My expectation here was that dbt would examine the sf_orders model and see that the default schema for the project it sits in (dbt_dwh) is dwh_dev, so it would construct the object reference as dwh_dev.sf_orders.
However, if you use command dbt run -m revenue_model1 then the default dbt behaviour is to assume all models are located in the default target for dbt_project1, so you get something like:
11:05:03 1 of 1 START sql table model project1_dev.revenue_model1 .................... [RUN]
11:05:04 1 of 1 ERROR creating sql table model project1_dev.revenue_model1 ........... [ERROR in 0.89s]
11:05:05
11:05:05 Completed with 1 error and 0 warnings:
11:05:05
11:05:05 Runtime Error in model revenue_model1 (folder\directory\revenue_model1.sql)
11:05:05 404 Not found: Table database_name.project1_dev.sf_orders was not found
I've got several questions here:
How do you force dbt to use a specific schema on runtime when using dbt ref function?
Is it possible to force dbt to use the default parameters/settings for models inside the dbt_dwh project when this Git repo is installed as a package in another project?
Some points to note:
All objects & schemas listed above sit in the same database
I know that many people recommend mono-repo set-up to avoid exactly this type of scenario, but switching to a mono-repo structure is not feasible right now, as we are already fully invested in multi-repo setup
Although it would be feasible to create source.yml files in each of the dbt projects to reference the output objects of the dbt_dwh project, this feels like duplication of effort and could result in different versions of the same sources.yml file across projects
I appreciate it is possible to hard-code the output schema in the dbt config block, but this removes our ability to test in dev environment/schema for dbt_dwh project
I managed to find a solution so I'll answer my own question in case anybody else runs up against the same issue. Unfortunately this is not documented anywhere that I can find, however, a throw-away comment in the dbt Slack workspace sparked an idea that allowed me to find the/a solution (I'll post the message if I manage to find it again, to give credit where it's due).
To fix this is actually very simple, you just need to add the project being imported to your profiles.yml file and specify the schema. For our use case this is fine as we only have 1 schema we use.
profiles.yml for dbt_project1:
models:
db_project_1:
outputs:
project1_dev:
<configs here>
project1_prod:
<configs here>
dbt_dwh:
+schema: [[schema you want these models to run into]]
<configs here>
The advantages with this approach are:
When you generate/serve dbt docs it allows you to see the upstream lineage from the upstream project
If there are any upstream dependencies in your upstream project you can run this using dbt run -m +model_name (this can be super handy)
If you don't want this behaviour then you can use dbt run -m +model_name --exclude dbt_dwh (for example) to prevent models in your upstream project from running.
I haven't yet figured out if it is possible to use the default parameters/settings for models inside the upstream project (in this case dbt_dwh) but I will edit this answer if I find a way.

dbt - no output on variable flags.WHICH

My issue resides on the fact that when I invoke via Jinja the variable {{ flags.WHICH}} it returns no output.
I am trying to use this variable to get what type of command the DBT is running at the moment, either a run, a test, generate, etc.
I am using the version dbt 0.18.1 with the adapter SPARK
flags.WHICH was not introduced until dbt 1.0. You'll have to upgrade to get that feature. Here is the source for the flags module, if you're interested about the flags available in your version.
Note that in jinja, referencing an undefined variable simply templates to the empty string, and does not raise an exception.

Missing Template Arguments for pcl::gpu::EuclideanClusterExtraction in PCL-1.12

I am trying this example to use PCL with GPU and get the error
~/gpu-pcl/main.cpp:85: error: missing template arguments before ‘gec’
pcl::gpu::EuclideanClusterExtraction gec;
I have tried that example with pcl-1.11.1 and it worked well .But when updated to pcl-1.12.1, I get that error.
My work environment:
Ubuntu 18.04,
with Cmake version 3.20,
Is there anything that I have missed out??
In the documentation of pcl1.12:
template
class pcl::gpu::EuclideanClusterExtraction< PointT >
EuclideanClusterExtraction is a template class, thus the type of point of the point cloud is needed in the position of PointT, for example, PointXYZ.[https://pointclouds.org/documentation/classpcl_1_1gpu_1_1_euclidean_cluster_extraction.html#details]

DBT filtering for (None) when running on incremental model

I'm trying to configure a DBT model as materialized='incremental', which is failing as DBT seems to be wrapping my model with a check on (None) or (None) is null which causes the model to throw a SQL exception against the target (Bigquery). The (None) checks don't seem to get added for non-incremental models, or when running with --full-refresh which just re-create the table.
According to the docs, incremental models are supposed to be wrapped as follows:
merge into {{ destination_table }} DEST
using ({{ model_sql }}) SRC
...
However what I'm seeing is:
merge into {{ destination_table }} DEST
using ( select * from( {{ model_sql }} ) where (None) or (None) is null) SRC
...
It's not clear to me where the (None) check are coming from, what it's actually trying to achieve by wrapping the query, and what (if any) model config would need to be set to correct this.
My model's config is set as {{ config(materialized='incremental', alias='some_name') }}, and I've tried also setting unique_key just in case with no luck.
I'm running the model with dbt run --profiles-dir dbt_profiles --models ${MODEL} --target development, and can confirm the compiled model is fine and the (None) checks get added for the model run.
I'm running dbt 0.11.1 (old repo version).
Any help would be most appreciated!
Managed to resolve this by looking into the DBT codebase on github for my target version - incremental macro 0.11
Seems like in 0.11 DBT expects a sql_where config flag to be set, which is used to select which records you want to use for the incremental load (pre-cursor to is_incremental() macro).
In my case, as I just want to load all rows in each incremental run and tag with the load timestamp, Setting sql_where='TRUE' generates valid sequel and doesn't filter my results (ie. WHERE TRUE OR TRUE IS NULL)
have you had an incremental model configured beforehand with 0.11.1? I'm pretty sure you need to use {{ this }} but maybe that didn't exist in version 0.11.1. docs on this

How to add a Jinja function to .sqlfluff config

I'm using the jinja functions run_query and execute.
https://docs.getdbt.com/reference/dbt-jinja-functions/run_query
But when sqlfluff lint I get the following error:
Undefined jinja template variable: 'run_query'
I'm trying to add it to the .sqlfluff config but there doesn't seem to be any guidance anywhere on how to add this to the config file.
Any help would be greatly appreciated!
Thanks
Add templater=dbt in your .sqlfluff config file.
More info here.
I have managed to figure out how to add run_query:
run_query = {% macro run_query(query) %}'query'{% endmacro %}
But I am still unsure on how to add execute to the .sqlfluff config. Figured it!
execute = {% macro execute() %}True{% endmacro %}