Singular/Data test missing test_metadata in dbt - dbt

I am trying to setup a singular test in dbt (it’s a test for one specific table - TableA), so I wrote an SQL query which I placed in tests folder. It returns failing rows.
However, when I run dbt test —-select tableA, in case the test passes (no failing records), I get the following error:
14:20:57 Running dbt Constraints
14:20:58 Database error while running on-run-end
14:20:59 Encountered an error:
Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
'dbt.tableA.graph.compiled.CompiledSingularTestNode object' has no attribute 'test_metadata’
In case the test fails, it returns the failing rows, which is correct behaviour.
I am using dbt_constraints package (v0.3.0), which seems to be causing this problem, specifically this script which runs in the on-run-end hook https://github.com/Snowflake-Labs/dbt_constraints/blob/main/macros/create_constraints.sql
I am guessing I need to add some test metadata to the singular test, but not sure how to do it.
Here is what the test looks like
tests/table_a_test.sql
SELECT *
FROM {{ ref('TableA') }}
WHERE param_1 NOT IN
(SELECT TableB_id
FROM {{ ref('TableB') }}
UNION
SELECT TableC_id
FROM {{ ref('TableC') }}
UNION
SELECT TableD_id
FROM {{ ref('TableD') }}
UNION
SELECT TableE_id
FROM {{ ref ('TableE') }} )
and param_2 is null
Thank you!

This seems to be a bug in that package; I would open an issue in the dbt-constraints repo. There is no documented way to add metadata to a Singular test, but that code assumes that all tests will have test_metadata.name.
I doubt this would work, but what happens if you add a schema.yml file to the tests directory, alongside your singular test? The contents would look like:
version: 2
tests:
- name: table_a_test

sounds like your call should be dbt test —-select table_a_test instead of dbt test —-select tableA. I think, you need to call the test name not the table name, which is already hard coded in the (singular) test. does that work?

Have you tried to run the test with a + sign in front of it? Since you are using ref in the test, you might need to build everything before test.

Related

DBT Test configuration for particular scenario

Hello Could anyone help me how to simulate this scenario. Example I want to validate these 3 fields on my table "symbol_type", "symbol_subtype", "taker_symbol" and return unique combination/result.
I tried to use this command, however Its not working properly on my test. Not sure if this is the correct syntax to simulate my scenario. Your response is highly appreciated.
Expected Result: These 3 fields should return my unique combination using DBT commands.
I'd recommend to either:
use the generate_surrogate_key (docs) macro in the model, or
use the dbt_utils.unique_combination_of_columns (docs) generic test.
For the first case, you would need to define the following in the model:
select
{{- dbt_utils.generate_surrogate_key(['symbol_type', 'symbol_subtype', 'taker_symbol']) }} as hashed_key_,
(...)
from your_model
This would create a hashed value of the three columns. You could then use a unique test in your YAML file.
For the second case, you would only need to add the generic test in your YAML file as follows:
# your model's YAML file
- name: your_model_name
description: ""
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- symbol_type
- symbol_subtype
- taker_symbol
Both these approaches will let you check whether the combination of the three columns is unique over the whole model's output.

dbt: how can I rename tests to more understandable and readable custom names

The default names that dbt chooses for a test can be very long and when it's too long, dbt chooses to hash the last part.
Example:
dbt_expectations_source_expect_table_row_count_to_equal_other_table_exponea_purchase_ref_test_exponea_sdv_orders_v___eventoccuredtime_yesterday_timestamp_AND_eventoccuredtime_today_timestamp___timestamp_yesterday_timestamp_AND_timestamp_today_timestamp_
How can I rename a dbt test (give a custom name) so that it is much more clear in the logs what a test was doing? (when it failed)
You can use the name: keyword to give a test a custom name:
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
name: exponea_purchase_row_count_yesterday_at_least_2500
min_value: 2500
row_condition: "timestamp >= {{ yesterday_timestamp() }}"
However for the not_null generic test this doesn't seem to work, so then you have to use name for the custom name and with test_name you can specify that you want not_null:
columns:
- name: timestamp
description: the view is partitioned on this column
tests:
- name: exponea_purchase_timestamp_not_null
test_name: not_null
See also the dbt documentation here:
https://docs.getdbt.com/reference/resource-properties/tests#define-a-custom-name-for-one-test
Or the explanation on the forum here:
https://discourse.getdbt.com/t/customizing-dbt-test-output-name/5550

How can I reference a table in dbt using its alias and a var, not its resource name?

I have been able to create a reasonably complex dbt model which contains several models all of which rely on a single model that acts as a filter.
Broadly, the numerous models follow the pattern:
{{ config(materialized = 'view') }}
SELECT
*
FROM
TABLE
INNER JOIN
{{ ref('filter_table') }} FILTER
ON
TABLE.KEY = FILTER.KEY
The filter table, let's imagine it's called filter_table.sql is simply:
{{ config(materialized = 'view') }}
SELECT
*
FROM
FILTER_SOURCE
WHERE
RELEVANT = True
This works fine when I reference it in the numerous models like this: {{ ref('filter_table') }}.
However, when I try to use an alias in the filter table it seems that the alias is not resolved in time for dbt to be able to recognise it.
I amend the config of filter_table.sql to this...
{{ config(materialized = 'view', alias = 'FILT') }}
...and the references in the dependant models like this...
{{ ref(var('filter_table_alias')) }}
...with a var in dbt_project.yml set like this:
vars:
filter_table_alias: 'FILT'
I get a message though which states that the node named 'FILT' is not found.
So my working theory is that although dbt recognised the dependencies based on how the refs are set up it is not able to do this using an alias - presumably the alias is not processed by the time that it is setting up the graph.
Is there a quick way to set up the alias and force it to be loaded first?
Or am I barking up the wrong tree?
The alias only impacts the name of the relation where the model is materialized in your database. ref always takes a model name, not an alias.
So you can add an alias = 'FILT' config to your filter table if you want, but in the other models you must continue to ref('filter_table').
The reason for this distinction is that dbt model names must be unique (within a dbt package/project), but aliases need not be unique (if they are materialized to different schemas).
You might be able to take advantage of dbt Classing - check out api.Relation, in which the identifier could be set as the alias I believe...

How to use BigQuery's new ASSERT statement with EU located data

The BigQuery release notes for July 13th 2020 announced that the ASSERT statement is now available.
I was trying it out with my data but couldn't get it to work. Issue seems to be that my data is in EU location, as opposed to US. The release notes and page make no mention of ASSERT being region specific so I'm unsure if I'm using it wrong or this is a bug.
To test I created two datasets, dataset_eu and dataset_us, in the relevant locations. In each I made the same table called inputs from the following query:
SELECT 'foo' AS x
UNION ALL
SELECT 'bar' AS x
Querying the US dataset with a processing location of US runs fine.
ASSERT (SELECT COUNT(*) FROM dataset_us.inputs) > 0 AS 'No rows'
However querying the EU dataset with a processing location of EU runs gives an Unsupported statement ASSERT error.
ASSERT (SELECT COUNT(*) FROM dataset_eu.inputs) > 0 AS 'No rows'
I did also try including project prefix but still got error.
This seems to be a bug/limitation on BigQuery side. I'm also facing the same issue while testing this new feature.
I've created a public issue in the IssueTracker.
FTR: This is even easier to reproduce. Execute the following query with different "processing location" in the "query options":
ASSERT TRUE
EDIT: Since today, it is working. It seems that Google resolved the issue!

Write dbt test for positive values

Is there an easy way to write a test for a column being positive in dbt?
accepted_values doesn't seem to work for continuous vatiables.
I know you can write queries in ./tests but it looks like an overkill for such a simple thing.
You could use dbt_utils.expression_is_true
version: 2
models:
- name: model_name
tests:
- dbt_utils.expression_is_true:
expression: "col_a > 0"
I think the dbt_utils suggestion is good, the only reasonable alternative I can think of is writing a custom schema test:
https://docs.getdbt.com/docs/guides/writing-custom-schema-tests/
But why bother when you can just use expression_is_true
#jake