DBT filtering for (None) when running on incremental model - sql

I'm trying to configure a DBT model as materialized='incremental', which is failing as DBT seems to be wrapping my model with a check on (None) or (None) is null which causes the model to throw a SQL exception against the target (Bigquery). The (None) checks don't seem to get added for non-incremental models, or when running with --full-refresh which just re-create the table.
According to the docs, incremental models are supposed to be wrapped as follows:
merge into {{ destination_table }} DEST
using ({{ model_sql }}) SRC
...
However what I'm seeing is:
merge into {{ destination_table }} DEST
using ( select * from( {{ model_sql }} ) where (None) or (None) is null) SRC
...
It's not clear to me where the (None) check are coming from, what it's actually trying to achieve by wrapping the query, and what (if any) model config would need to be set to correct this.
My model's config is set as {{ config(materialized='incremental', alias='some_name') }}, and I've tried also setting unique_key just in case with no luck.
I'm running the model with dbt run --profiles-dir dbt_profiles --models ${MODEL} --target development, and can confirm the compiled model is fine and the (None) checks get added for the model run.
I'm running dbt 0.11.1 (old repo version).
Any help would be most appreciated!

Managed to resolve this by looking into the DBT codebase on github for my target version - incremental macro 0.11
Seems like in 0.11 DBT expects a sql_where config flag to be set, which is used to select which records you want to use for the incremental load (pre-cursor to is_incremental() macro).
In my case, as I just want to load all rows in each incremental run and tag with the load timestamp, Setting sql_where='TRUE' generates valid sequel and doesn't filter my results (ie. WHERE TRUE OR TRUE IS NULL)

have you had an incremental model configured beforehand with 0.11.1? I'm pretty sure you need to use {{ this }} but maybe that didn't exist in version 0.11.1. docs on this

Related

How to specify model schema when referencing another dbt project as a package? (dbt multi-repo setup)

We're using a dbt multi-repo setup with different projects for different business areas. We have several projects, something like this:
dbt_dwh
dbt_project1
dbt_project2
The dbt_dwh project contains models which we plan to reference in projects 1 and 2 (we have ~10 projects that would reference the dbt_dwh project) by way of installing git packages. Ideally, we'd like to be able to just reference the models in the dbt_dwh project (e.g.
SELECT * FROM {{ ref('dbt_dwh', 'model_1') }}). However, each of our projects sits in it's own database schema and this causes issue upon dbt run because dbt uses the target schema from dbt_project_x, where these objects don't exist. I've included example set-up info below, for clarity.
packages.yml file for dbt_project1:
packages:
- git: https://git/repo/url/here/dbt_dwh.git
revision: master
profiles.yml for dbt_dwh:
dbt_dwh:
target: dwh_dev
outputs:
dwh_dev:
<config rows here>
dwh_prod:
<config rows here>
profiles.yml for dbt_project1:
dbt_project1:
target: project1_dev
outputs:
project1_dev:
<config rows here>
project1_prod:
<config rows here>
sf_orders.sql in dbt_dwh:
{{
config(
materialized = 'table',
alias = 'sf_orders'
)
}}
SELECT * FROM {{ source('salesforce', 'orders') }} WHERE uid IS NOT NULL
revenue_model1.sql in dbt_project1:
{{
config(
materialized = 'table',
alias = 'revenue_model1'
)
}}
SELECT * FROM {{ ref('dbt_dwh', 'sf_orders') }}
My expectation here was that dbt would examine the sf_orders model and see that the default schema for the project it sits in (dbt_dwh) is dwh_dev, so it would construct the object reference as dwh_dev.sf_orders.
However, if you use command dbt run -m revenue_model1 then the default dbt behaviour is to assume all models are located in the default target for dbt_project1, so you get something like:
11:05:03 1 of 1 START sql table model project1_dev.revenue_model1 .................... [RUN]
11:05:04 1 of 1 ERROR creating sql table model project1_dev.revenue_model1 ........... [ERROR in 0.89s]
11:05:05
11:05:05 Completed with 1 error and 0 warnings:
11:05:05
11:05:05 Runtime Error in model revenue_model1 (folder\directory\revenue_model1.sql)
11:05:05 404 Not found: Table database_name.project1_dev.sf_orders was not found
I've got several questions here:
How do you force dbt to use a specific schema on runtime when using dbt ref function?
Is it possible to force dbt to use the default parameters/settings for models inside the dbt_dwh project when this Git repo is installed as a package in another project?
Some points to note:
All objects & schemas listed above sit in the same database
I know that many people recommend mono-repo set-up to avoid exactly this type of scenario, but switching to a mono-repo structure is not feasible right now, as we are already fully invested in multi-repo setup
Although it would be feasible to create source.yml files in each of the dbt projects to reference the output objects of the dbt_dwh project, this feels like duplication of effort and could result in different versions of the same sources.yml file across projects
I appreciate it is possible to hard-code the output schema in the dbt config block, but this removes our ability to test in dev environment/schema for dbt_dwh project
I managed to find a solution so I'll answer my own question in case anybody else runs up against the same issue. Unfortunately this is not documented anywhere that I can find, however, a throw-away comment in the dbt Slack workspace sparked an idea that allowed me to find the/a solution (I'll post the message if I manage to find it again, to give credit where it's due).
To fix this is actually very simple, you just need to add the project being imported to your profiles.yml file and specify the schema. For our use case this is fine as we only have 1 schema we use.
profiles.yml for dbt_project1:
models:
db_project_1:
outputs:
project1_dev:
<configs here>
project1_prod:
<configs here>
dbt_dwh:
+schema: [[schema you want these models to run into]]
<configs here>
The advantages with this approach are:
When you generate/serve dbt docs it allows you to see the upstream lineage from the upstream project
If there are any upstream dependencies in your upstream project you can run this using dbt run -m +model_name (this can be super handy)
If you don't want this behaviour then you can use dbt run -m +model_name --exclude dbt_dwh (for example) to prevent models in your upstream project from running.
I haven't yet figured out if it is possible to use the default parameters/settings for models inside the upstream project (in this case dbt_dwh) but I will edit this answer if I find a way.

dbt - no output on variable flags.WHICH

My issue resides on the fact that when I invoke via Jinja the variable {{ flags.WHICH}} it returns no output.
I am trying to use this variable to get what type of command the DBT is running at the moment, either a run, a test, generate, etc.
I am using the version dbt 0.18.1 with the adapter SPARK
flags.WHICH was not introduced until dbt 1.0. You'll have to upgrade to get that feature. Here is the source for the flags module, if you're interested about the flags available in your version.
Note that in jinja, referencing an undefined variable simply templates to the empty string, and does not raise an exception.

DBT - [WARNING]: Did not find matching node for patch

I keep getting the error below when I use dbt run - I can't find anything on why this error occurs or how to fix it within the dbt documentation.
[WARNING]: Did not find matching node for patch with name 'vGenericView' in the 'models' section of file 'models\generic_schema\schema.sql'
did you by chance recently upgrade to dbt 1.0.0? If so, this means that you have a model, vGenericView defined in a schema.yml but you don't have a vGenericView.sql model file to which it corresponds.
If all views and tables defined in schema are 1 to 1 with model files then try to run dbt clean and test or run afterward.
Not sure what happened to my project, but ran into frustration looking for missing and/or misspelled files when it was just leftovers from different compiled files not cleaned out. Previously moved views around to different schemas and renamed others.
So the mistake is here in the naming:
The model name in the models.yml file should for example be: employees
And the sql file should be named: employees.sql
So your models.yml will look like:
version: 2
models:
- name: employees
description: "View of employees"
And there must be a model with file name: employees.sql
One case when this will happen is if you have the same data source defined in two different schema.yml file (or whatever you call it)

dbt (data build tool) jinja modules - 'dict object' has no attritute 're'

According to DBT's docs on modules to use within jinja functions - https://docs.getdbt.com/reference/dbt-jinja-functions/modules - modules.re should be available. However, there is this macro I am working with:
{% macro camel_to_snake_case(camel_case_string) -%}
{{ modules.re.sub('([A-Z][a-z]|[A-Z]*[0-9]+)', '_\\1', modules.re.sub('([A-Z]+[A-Z]([a-z]|$))', '_\\1', camel_case_string)) | trim('_') | lower() }}
{%- endmacro %}
and whenever a script is run that uses this macro, i receive the error:
Running with dbt=0.17.0
Encountered an error:
Compilation Error in model model_using_macro (models/model_using_macro.sql)
'dict object' has no attribute 're'
Do I need to install something in order to access the modules.re function? Maybe the base dbt I have installed doesn't have this modules at all? Perhaps there is a way I can check the output for modules to see why re is missing, and what else might be available / missing? I'm not sure why else this error could be happening?
Try upgrading dbt, re was added in 0.19.0 (source)

airflow test mode xcom pull/push not working

I try to test 2 tasks through the airflow cli test command`
The first task run, auto pushes last console out to xcom and i see the value some value in the airflow GUI as expected
When i run the second task via airflow cli test command i just get None as return value but as i have read here: How to test Apache Airflow tasks that uses XCom that it should work and at least the xcom_push is obvious working, why not the xcom_pull?
Someone has a hint how to get this working?
Provide context is set to true.
Example code:
t1 = BashOperator(
task_id='t1',
bash_command='echo "some value"',
xcom_push=True,
dag=dag
)
t2 = BashOperator(
task_id='t2',
bash_command='echo {{ ti.xcom_pull(task_ids="t1") }}',
xcom_push=True,
dag=dag
)
Thanks!
Edit: when i run the code (DAG) without test mode the xcom_pull works fine
As far as I know, "test" runs without saving anything to the metadata database which is why when you run the puller task, you get "None" as a result and when you actually run the DAG code, it works.
You can query the metadata database directly after testing the first task to verify this.
Context seems to be missing here, along with xcom_push=True, we need to use provide_context=True