Defining big query dbt sources with characters in table name? - google-bigquery

After reviewing both of the below resources:
Source configurations
BigQuery configurations
I was unable to find an answer to this question:
Given a standard dbt project directory, I am defining a sources.yml which points to pre-existing bigquery tables that contain character names.
sources.yml:
version: 2
sources:
- name: biqquery
tables:
- name: `fa--task.dataset.addresses`
- name: `fa--task.dataset.devices`
- name: `fa--task.dataset.orders`
- name: `fa--task.dataset.payments`
Using tilde as in ` was successful directly from a select statement:
(select * from `fa--task.dataset.orders`)
but is not recognized as valid yaml in sources.
The desired result would be something like:
{{ sources('bigquery','`fa--task.dataset.addresses`') }}
Edit: Updated source.yml as requested:

Try this!
version: 2
sources:
- name: bigquery # are you sure you want to name it this? usually we name things after the data source, like 'stripe', or 'saleforce'
schema: dataset
database: fa--task
tables:
- name: addresses
- name: devices
- name: orders
- name: payments
Then in your models can do:
select * from {{ source('bigquery', 'addresses') }}
It might worth checking out the guide on sources to wrap your head around what's happening here, as well as the docs for source properties which contains the list of the keys available under the source: keys.

Related

How can I select dbt models by `meta` field?

I am trying to figure out how and if I can select dbt model by their meta field?
My dbt model documentation yaml file looks as follows:
- name: my_table
description: my table description
meta:
owner:
- Analytics
config:
tags:
- common
...
Now, I can select models by the defined tag with the command below:
dbt ls --select tag:common --resource-type model
Now, I would like to know how I can select models using the meta field information?
I tried the following, but this didn't work.
dbt ls --select meta.owner:Analytics
Thank you for any help!
meta properties are under config so you should be able select them like this example:
dbt ls --select config.meta.owner:"team1"

dbt: how can I rename tests to more understandable and readable custom names

The default names that dbt chooses for a test can be very long and when it's too long, dbt chooses to hash the last part.
Example:
dbt_expectations_source_expect_table_row_count_to_equal_other_table_exponea_purchase_ref_test_exponea_sdv_orders_v___eventoccuredtime_yesterday_timestamp_AND_eventoccuredtime_today_timestamp___timestamp_yesterday_timestamp_AND_timestamp_today_timestamp_
How can I rename a dbt test (give a custom name) so that it is much more clear in the logs what a test was doing? (when it failed)
You can use the name: keyword to give a test a custom name:
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
name: exponea_purchase_row_count_yesterday_at_least_2500
min_value: 2500
row_condition: "timestamp >= {{ yesterday_timestamp() }}"
However for the not_null generic test this doesn't seem to work, so then you have to use name for the custom name and with test_name you can specify that you want not_null:
columns:
- name: timestamp
description: the view is partitioned on this column
tests:
- name: exponea_purchase_timestamp_not_null
test_name: not_null
See also the dbt documentation here:
https://docs.getdbt.com/reference/resource-properties/tests#define-a-custom-name-for-one-test
Or the explanation on the forum here:
https://discourse.getdbt.com/t/customizing-dbt-test-output-name/5550

Filebeat: how to create new field from the path?

i would like to add new field extracted from the path what will be used. I have two path, see below.
paths:
- /home/*/app/logs/*.log
# - /home/v209/app/logs/*.log
# - /home/v146/app/logs/*.log
fields:
campaign: v209
fields_under_root: true
i would like to create new field campaign only with folder name like v209 or v146 any idea, how to do this in filebeads?
Thank you in advance!
Here are three suggested solutions tested with Filebeat 7.1.3
1) Static configuration of campaign field per input
filebeat.inputs:
- type: filestream
id: v209
paths:
- "/home/v209/app/logs/*.log"
fields:
campaign: v209
fields_under_root: true
- type: filestream
id: v146
paths:
- "/home/v146/app/logs/*.log"
fields:
campaign: v146
fields_under_root: true
output.console:
pretty: true
Explanation: This solution is simple. Each file input will have a field set (campaign) based on a static config.
Pros/Cons: This option has the problem of having to add a new campaign field every time you add a new path. For dynamic environments, this can pose a serious operational problem but it's dead simple to implement.
2) Dynamically extract campaign name from file path
processors:
- dissect:
tokenizer: "/%{key1}/%{campaign}/%{key3}/%{key4}/%{key5}"
field: "log.file.path"
target_prefix: ""
- drop_fields:
when:
has_fields: ['key1','key3','key4','key5']
fields: ['key1','key3','key4','key5']
Explanation: These processors work on top of your filestream or log input messages. The dissect processor will tokenize your path string and extract each element of your full path. The drop_fields processor will remove all fields of no interest and only keep the second path element (campaign id).
Pros/Cons: Assuming your path structures are stable, with this solution you don't have to do anything when new files appear under /home/*/app/logs/*.log
3) Script your way around
If you wish to setup a more custom parsing logic, I'd suggest trying out the script processor and hack your way until your requirements are met:
https://www.elastic.co/guide/en/beats/filebeat/7.17/processor-script.html

How to persist column descriptions in BigQuery tables

I have created models in my dbt(data build tool) where I have specified column description. In my dbt_project.yml file as shown below
models:
sakila_dbt_project:
# Applies to all files under models/example/
+persist_docs:
relation: true
columns: true
events:
materialized: table
+schema: examples
I have added +persist_docs as described by dbt as the fix to make column description appear but still no description appears in bigquery table.
My models/events/events.yml looks like this
version: 2
models:
- name: events
description: This table contains clickstream events from the marketing website
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null
- name: user-id
quote: true
description: The user who performed the event
tests:
- not_null
What I'm I missing?
p.s I'm using dbt version 0.21.0
Looks consistent with the required format as shown in the docs:
dbt_project.yml
models:
..[<resource-path>](resource-path):
....+persist_docs:
......relation: true
......columns: true
models/schema.yml
version: 2
models:
..- name: dim_customers
....description: One record per customer
....columns:
......- name: customer_id
........description: Primary key
Maybe spacing? I converted the spaces to periods in the examples above because the number of spaces is unforgivingly specific for yml files.
I've started using the vscode yml formatter because of how often I run into spacing issues on these keys in both the schema.yml and the dbt_project.yml
Otherwise, this isn't for a source or external-table right? Those are the only two artifacts that persist-docs is unsupported for.
Sources unsupported persist_docs -> sources tab
External Tables unsupported (Can't find in docs again but read today in docs or github issue)
Also Apache Spark unsupported (irrelevant here) Apache Spark Profile
Also, if you're going to be working with persist_docs a lot, check out this macro example persist_docs_op that Jeremy left for a run-operation to update your persisted docs in case that's all you changed!

RedisGraph - Combining multiple directives with MERGE

I am currently running the below query on Neo4J
match (p:Initial{state: 'Initial', name: 'Initial'}), (c:Encounter{code:'abcd', state: 'Encounter', name: 'Encounter1'})
merge (p)-[:raw {person_id:'1234', type:'Encounter', code:'abcd'}]->(c)
However I am unable to do the same query on RedisGraph.
According to what I have found so far, Redis does not seem to support combining MERGEwith other directives
Is there any workaround to this?
Can the query be changed to allow it to execute the same functionality without the match statement?
The only option I see right now is to split this into two queries,
The first one checks to see if p is connected to c:
MATCH (p:Initial{state: 'Initial', name: 'Initial'})-[:raw {person_id:'1234', type:'Encounter', code:'abcd'}]->(c:Encounter{code:'abcd', state: 'Encounter', name: 'Encounter1'}) RETURN p,c
If the above query returns empty issue a second query to form the relation:
MATCH (p:Initial{state: 'Initial', name: 'Initial'})(c:Encounter{code:'abcd', state: 'Encounter', name: 'Encounter1'}) CREATE (p)-[:raw {person_id:'1234', type:'Encounter', code:'abcd'}]->(c)