How can I select dbt models by `meta` field? - sql

I am trying to figure out how and if I can select dbt model by their meta field?
My dbt model documentation yaml file looks as follows:
- name: my_table
description: my table description
meta:
owner:
- Analytics
config:
tags:
- common
...
Now, I can select models by the defined tag with the command below:
dbt ls --select tag:common --resource-type model
Now, I would like to know how I can select models using the meta field information?
I tried the following, but this didn't work.
dbt ls --select meta.owner:Analytics
Thank you for any help!

meta properties are under config so you should be able select them like this example:
dbt ls --select config.meta.owner:"team1"

Related

DBT Test configuration for particular scenario

Hello Could anyone help me how to simulate this scenario. Example I want to validate these 3 fields on my table "symbol_type", "symbol_subtype", "taker_symbol" and return unique combination/result.
I tried to use this command, however Its not working properly on my test. Not sure if this is the correct syntax to simulate my scenario. Your response is highly appreciated.
Expected Result: These 3 fields should return my unique combination using DBT commands.
I'd recommend to either:
use the generate_surrogate_key (docs) macro in the model, or
use the dbt_utils.unique_combination_of_columns (docs) generic test.
For the first case, you would need to define the following in the model:
select
{{- dbt_utils.generate_surrogate_key(['symbol_type', 'symbol_subtype', 'taker_symbol']) }} as hashed_key_,
(...)
from your_model
This would create a hashed value of the three columns. You could then use a unique test in your YAML file.
For the second case, you would only need to add the generic test in your YAML file as follows:
# your model's YAML file
- name: your_model_name
description: ""
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- symbol_type
- symbol_subtype
- taker_symbol
Both these approaches will let you check whether the combination of the three columns is unique over the whole model's output.

How can I reference a table in dbt using its alias and a var, not its resource name?

I have been able to create a reasonably complex dbt model which contains several models all of which rely on a single model that acts as a filter.
Broadly, the numerous models follow the pattern:
{{ config(materialized = 'view') }}
SELECT
*
FROM
TABLE
INNER JOIN
{{ ref('filter_table') }} FILTER
ON
TABLE.KEY = FILTER.KEY
The filter table, let's imagine it's called filter_table.sql is simply:
{{ config(materialized = 'view') }}
SELECT
*
FROM
FILTER_SOURCE
WHERE
RELEVANT = True
This works fine when I reference it in the numerous models like this: {{ ref('filter_table') }}.
However, when I try to use an alias in the filter table it seems that the alias is not resolved in time for dbt to be able to recognise it.
I amend the config of filter_table.sql to this...
{{ config(materialized = 'view', alias = 'FILT') }}
...and the references in the dependant models like this...
{{ ref(var('filter_table_alias')) }}
...with a var in dbt_project.yml set like this:
vars:
filter_table_alias: 'FILT'
I get a message though which states that the node named 'FILT' is not found.
So my working theory is that although dbt recognised the dependencies based on how the refs are set up it is not able to do this using an alias - presumably the alias is not processed by the time that it is setting up the graph.
Is there a quick way to set up the alias and force it to be loaded first?
Or am I barking up the wrong tree?
The alias only impacts the name of the relation where the model is materialized in your database. ref always takes a model name, not an alias.
So you can add an alias = 'FILT' config to your filter table if you want, but in the other models you must continue to ref('filter_table').
The reason for this distinction is that dbt model names must be unique (within a dbt package/project), but aliases need not be unique (if they are materialized to different schemas).
You might be able to take advantage of dbt Classing - check out api.Relation, in which the identifier could be set as the alias I believe...

How to persist column descriptions in BigQuery tables

I have created models in my dbt(data build tool) where I have specified column description. In my dbt_project.yml file as shown below
models:
sakila_dbt_project:
# Applies to all files under models/example/
+persist_docs:
relation: true
columns: true
events:
materialized: table
+schema: examples
I have added +persist_docs as described by dbt as the fix to make column description appear but still no description appears in bigquery table.
My models/events/events.yml looks like this
version: 2
models:
- name: events
description: This table contains clickstream events from the marketing website
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null
- name: user-id
quote: true
description: The user who performed the event
tests:
- not_null
What I'm I missing?
p.s I'm using dbt version 0.21.0
Looks consistent with the required format as shown in the docs:
dbt_project.yml
models:
..[<resource-path>](resource-path):
....+persist_docs:
......relation: true
......columns: true
models/schema.yml
version: 2
models:
..- name: dim_customers
....description: One record per customer
....columns:
......- name: customer_id
........description: Primary key
Maybe spacing? I converted the spaces to periods in the examples above because the number of spaces is unforgivingly specific for yml files.
I've started using the vscode yml formatter because of how often I run into spacing issues on these keys in both the schema.yml and the dbt_project.yml
Otherwise, this isn't for a source or external-table right? Those are the only two artifacts that persist-docs is unsupported for.
Sources unsupported persist_docs -> sources tab
External Tables unsupported (Can't find in docs again but read today in docs or github issue)
Also Apache Spark unsupported (irrelevant here) Apache Spark Profile
Also, if you're going to be working with persist_docs a lot, check out this macro example persist_docs_op that Jeremy left for a run-operation to update your persisted docs in case that's all you changed!

Defining big query dbt sources with characters in table name?

After reviewing both of the below resources:
Source configurations
BigQuery configurations
I was unable to find an answer to this question:
Given a standard dbt project directory, I am defining a sources.yml which points to pre-existing bigquery tables that contain character names.
sources.yml:
version: 2
sources:
- name: biqquery
tables:
- name: `fa--task.dataset.addresses`
- name: `fa--task.dataset.devices`
- name: `fa--task.dataset.orders`
- name: `fa--task.dataset.payments`
Using tilde as in ` was successful directly from a select statement:
(select * from `fa--task.dataset.orders`)
but is not recognized as valid yaml in sources.
The desired result would be something like:
{{ sources('bigquery','`fa--task.dataset.addresses`') }}
Edit: Updated source.yml as requested:
Try this!
version: 2
sources:
- name: bigquery # are you sure you want to name it this? usually we name things after the data source, like 'stripe', or 'saleforce'
schema: dataset
database: fa--task
tables:
- name: addresses
- name: devices
- name: orders
- name: payments
Then in your models can do:
select * from {{ source('bigquery', 'addresses') }}
It might worth checking out the guide on sources to wrap your head around what's happening here, as well as the docs for source properties which contains the list of the keys available under the source: keys.

How to properly use ActiveYaml with Rails 3.0.4?

I ran across ActiveHash/ActiveYaml while learning Rails and wanted to use it to load lookup data. After following the installation directions, I got the ActiveHash::Base stuff working. I'm trying to load data from a YML file that looks like this:
AK:
name: Alaska
abbreviation: AK
AL:
name: Alabama
abbreviation: AL
I have a class in my models folder called usstates.rb that looks like this:
class USState < ActiveYaml::Base
set_root_path "#{RAILS_ROOT}/config/constants/"
set_filename "USStates"
fields :name, :abbreviation
end
I've tried to place my YML file in both the /config/constants/ and models folder. Each time I try to do something in Rails Console like USState.first, I get the following error:
NameError: uninitialized constant USState
How do I get this to load the YML file and show the items? This also fails if I comment out the sets in the class.