Using environment-dependent source specifications in DBT

Using environment-dependent source specifications in DBT - sql

I have a bit of an odd problem where I need to union together multiple source databases in production, but only one in lower environments. I'm using DBT and would like to use the source functionality so I can trace the origin of my data lineage, but I'm not quite sure how to handle this case. Here's the naive non-source approach:
{% set clouds = [1, 2, 3] %} {# this clouds variable will come from the environment, instead of hard coded. In lower envs, it would just be [1] #}
{% for cloudId in clouds %}
select *
from raw_{{ cloudId }}.users
{% if not loop.last %}
union all
{% endif %}
{% endfor %}
This isn't ideal, because I'm referencing my raw_n schema(s) directly. I'd love to have something like this:
version: 2
sources:
{% for cloud in env('CLOUDS') %}
- name: raw_{{ cloud }}
schema: raw_{{ cloud }}
database: raw
tables:
- name: users
identifier: users
{% endfor %}
So I can actually use the source() function in the sql files.
I'm not sure how to make such a configuration possible based on environment. Can this just simply not be done in dbt?

Since source is just a python/jinja function you can pass variables to it. So the following should work:
{% if target.name == `prod` %} {# this clouds variable will come from the environment, instead of hard coded. In lower envs, it would just be [1] #}
{% set clouds = [1, 2, 3] %}
{% else %}
{% set clouds = [1] %}
{% endif %}
{% for cloudId in clouds %}
select *
from {{ source(cloudId, 'users') }}
{% if not loop.last %}
union all
{% endif %}
{% endfor %}
as for the environment part you would have to use env_var function but those are always strings so you would write env_var('my_list').split(',') assuming its comma separated.
EDIT:
Per askers, comments revised solution to include info as to what environment is being used
EDIT #2:
I know we left this off on a rather unhelpful note but now I am having a different issue that suggests a solution that might be more helpful for you.
in dbt_project.yaml you can specify multiple paths to models/tests/seeds etc. you can also specify dynamic paths. So you could potentially modify your models-path to something like this: model-path: ['models','models_{{ target.name }}'] with this you have multiple source.yml models/source.yml will include all sources that don't change between dev/test/prodand then sources that do need to vary will be inmodels_{{ target.name }}`.
The same goes for models that will use them.
I know this isn't dynamic sources file still but it preserves lineages, and you do it in yaml just like you wanted.

Setting context here, I believe your primary interest is in working with the dbt docs / lineage graph for a prod / dev case?
In that case, as you are highlighting, the manifest is generated from the source.yml files within your model directories. So - effectively what you are asking about is the way to "activate" different versions of a single source.yml file based on environment?
Fair warning: dbt core's intentions doesn't align with that use case. So let's explore some alternatives.
If you want to hack something that is dbt-cli / local only, Jeremy lays out that you could approach this via bash/stdout:
The bash-y way
Leveraging the fact that dbt/Jinja can write anything
its heart desires to stdout, pipe the stdout contents somewhere else.
$ dbt run-operation generate_source --args 'schema_name: jaffle_shop'
> models/staging/jaffle_shop/src_jaffle_shop.yml
At least one reason that he points out is that there would be security implications if the dbt jinja compiler was un-sandboxed from the /target destination so I wouldn't expect this to change from dbt-core in the future.
A non-dbt helper tool.
From later in the same comment:
At some point, I'm so on board with the request—I'm just not sure if
dbt Core can be the right tool for the job, or it's a better fit for a
helper tool written in python.
Use git hooks to "switch" the active "source.yml" file in that directory?
This is just an idea that I haven't even looked into because it's somewhat far-fetched but basically use your environment variables to activate pre-run hooks that set source-dev.yml to gitignore in production and vice versa? The files would have to be defined statically so I'm not sure that helps anyway.

Related

How can I self reference the table I'm working on in a dbt Project?

I'm looking to self-reference the table I'm working on within a model file in the config block to alias the table name. Right now, I'm naming dynamically naming the alias using a Python file for loop but would prefer if the model file recognized and designated the table name in itself.
{{ config(
alias=model.table ### this.name? not sure what syntax to use here ###
) }}
select *
from {{ source('aceso', 'aceso_accountlookup') }}
{% if is_incremental() %}
where _FIVETRAN_SYNCED > (select max(_FIVETRAN_SYNCED) from {{ this }} )
{% endif %}
Currently I have no idea the format of syntax required to get dbt to understand what I want it to do

dbt currently has a strong one-database-object-per-model association, so what you seem to be trying to describe (based on reading your answers to #tconbeer's questions in the comments) isn't really possible without something hacky like you're already doing.
There is a GitHub Discussion around making it possible for dbt to generate multiple objects from a single model here that you may wish to contribute to.

DBT - how to namespace tables generated by different versions of a project without schemas?

Let's say I have a project in dbt. When I run it, it generates a bunch of tables. Now I want to change the underlying SQL and see what happens to these tables, how they differ from before the change. So I want to be able to compare all the tables generated by the old version to all the tables generated by the new version. Ideally I would like the method to work for any number of versions, not just two. Basically the question is how to put each version in its own namespace.
Method 1: run the new version of the project in a new schema, so I can compare old.foo to new.foo. But getting another schema from the database admins is a painful process.
Method 2: Have both versions in the same schema, but add a prefix, like new_ to the table name for the new version. So, old version has table foo, new version has new_foo, and I compare foo to new_foo.
Is there any convenient way to do Method 2 in dbt? Is there a third method I should be considering? Or am I doing something fundamentally wrong to even find myself in this situation? It seems like it shouldn't be such a rare problem but I can't find any information about what I can do in this situation.

One possible way to do this is to override the default alias macro. The macro gets called even if there is no alias defined in the configuration, so you can use that as an opportunity to rename the target table.
The version below will prefix any model that does not have an alias set in the configuration with name of the target profile when the run is not against the prod profile.
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}
{%- if target.name != 'prod' and custom_alias_name is none -%}
{{ target.name ~ "_" ~ node.name }}
{%- elif target.name == 'prod' -%}
{{ node.name }}
{%- else -%}
{{ custom_alias_name | trim }}
{%- endif -%}
{%- endmacro %}
If your model is foo.sql and you run this against a profile named "prod", the table will be foo. If you run it against "dev", it will be dev_foo. If your model has an alias, then the alias name will take precedence regardless of the target profile. You can decide if you want to include the special behavior if the model has an alias name. Just modify the else block.

Liquid - Load all products from collection handler not working

I am trying to load all products by the given handler from a collection.
They will click on a box (each box is a different collection), then it should load all products for that collection.
My issue is that my assigned variable cannot read the input from javascript.
I am doing onclick(id, name, handle) where i catch the handle and pass it to the liquid.
My code is:
function loadProducts(collectionH) {
var html = '';
var handlerString = collectionH;
console.log('Loading products...');
{{collectionHandleNew}} = handlerString;
console.log({{collectionHandleNew}}); // log the handler
// Make sure the current product name is loaded
{% if collectionHandleNew -%}
{% if selectedCategory -%}
console.log('Collection handle is set: ' + {{collectionHandleNew}});
{%- for product in collections[collectionHandleNew].products -%}
console.log({{product.id}});
{%- endfor -%}
{% else -%}
console.log('Selected category not found');
{% endif -%}
{% else -%}
console.log('Collection handle is not available');
{% endif -%}
return html;
}
The console is showing this:
view the image

The trick here is to remember that Liquid and Javascript serve two completely different purposes:
Liquid is a templating language that is parsed server-side to generate the documents that are sent to the client's browser, and is never seen by the client.
Javascript is a programming language that is parsed client-side to do dynamic things on the page and is never executed on the server.
Why is this distinction important? Because it means that we can use Liquid to generate Javascript, but the reverse is never true. It also means that any variables that we pass from Liquid to Javascript will only be current as of the time the document is generated and cannot be affected by anything that happens on the page once it's been rendered!
If your code needs to fetch a collection dynamically, I would recommend creating a function that takes a collection handle and loads the products entirely through Javascript using your favourite tool (fetch, jQuery.getJSON, XMLHttpRequest, Axios, etc).
To fetch products from a collection from the storefront, you can use /collections/<some-collection-handle>/products.json (for example, /collections/drawstring-bag-hoodies/products.json). If you're feeling fancy, you might also consider looking into the storefront GraphQL API instead of the traditional REST API.
One final note: Whenever you are dumping variables from Liquid into Javascript, I strongly recommend using the | json Liquid filter to ensure that the resulting output is Javascript-legal. There are lots of ways a Liquid variable dump can break Javascript, such as when the variable is unexpectedly empty, when it contains ' or ", when it contains line breaks, etc. By running the Liquid variable through this filter, the resulting output will be wrapped in the appropriate brackets or quotes, all special characters within will be properly escaped, and empty values will print null.

In ansible how to initialise a variable from another variable?

In an Ansible role, how to define a variable depending on another one?
I am designing a role and want its interface to understand a playbook variable like framework_enable_java = yes or framework_enable_java = mysql tomcat and want to write a vars/main.yml files that defines boolean values
framework_enable_java_core
framework_enable_java_mysql
framework_enable_java_tomcat
according to the content of framework_enable_java. I tried the obvious definitions similar to
framework_enable_java_mysql: 'mysql' in framework_enable_java
and several more or less subtle approaches like
framework_enable_java_mysql: {{ 'mysql' in framework_enable_java }}
or
{% if 'mysql' in framework_enable_java %}
framework_enable_java_mysql: yes
{% else %}
framework_enable_java_mysql: no
{% endif %}
None of them turned out to be working. The similar looking question is unrelated as it is more like implementing variable indirection than variable deduction.
Is it at all possible to write the desired vars/main.yml for my role? How would it look like? If it is not possible, what would be the best way to make these deductions? (e.g. using a task include?)

Answer from the comments:
framework_enable_java_mysql: "{{ 'mysql' in framework_enable_java }}"
Double quotes are essential here because otherwise YAML parser tries to construct an object(dictionary) and not templated variable.

url templatetag with "safe" arguments?

I'm trying to use the {% url %} template tag but with an argument to be substituted out later in Javascript. It looks something like this:
var pid = '7a8b323f-52b1-466c-91d3-b4i4d85b1c32';
var status_url = '{% url quote_status form_urlname inquiry_id instance_id '{0}' %}'.format(pid);
I tried using both {% autoescape off %} and |safe, neither of which seemed to work. Is there a good way to make this happen?

(snip previous answer, sorry, didn't read carefully enough)
If the argument is required to build the url, it just won't work - the templatetag is executed on the server, the javascript is executed on the browser.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using environment-dependent source specifications in DBT - sql

Related

How can I self reference the table I'm working on in a dbt Project?

DBT - how to namespace tables generated by different versions of a project without schemas?

Liquid - Load all products from collection handler not working

In ansible how to initialise a variable from another variable?

url templatetag with "safe" arguments?

Categories

Resources