How can I reference a table in dbt using its alias and a var, not its resource name? - dbt

I have been able to create a reasonably complex dbt model which contains several models all of which rely on a single model that acts as a filter.
Broadly, the numerous models follow the pattern:
{{ config(materialized = 'view') }}
SELECT
*
FROM
TABLE
INNER JOIN
{{ ref('filter_table') }} FILTER
ON
TABLE.KEY = FILTER.KEY
The filter table, let's imagine it's called filter_table.sql is simply:
{{ config(materialized = 'view') }}
SELECT
*
FROM
FILTER_SOURCE
WHERE
RELEVANT = True
This works fine when I reference it in the numerous models like this: {{ ref('filter_table') }}.
However, when I try to use an alias in the filter table it seems that the alias is not resolved in time for dbt to be able to recognise it.
I amend the config of filter_table.sql to this...
{{ config(materialized = 'view', alias = 'FILT') }}
...and the references in the dependant models like this...
{{ ref(var('filter_table_alias')) }}
...with a var in dbt_project.yml set like this:
vars:
filter_table_alias: 'FILT'
I get a message though which states that the node named 'FILT' is not found.
So my working theory is that although dbt recognised the dependencies based on how the refs are set up it is not able to do this using an alias - presumably the alias is not processed by the time that it is setting up the graph.
Is there a quick way to set up the alias and force it to be loaded first?
Or am I barking up the wrong tree?

The alias only impacts the name of the relation where the model is materialized in your database. ref always takes a model name, not an alias.
So you can add an alias = 'FILT' config to your filter table if you want, but in the other models you must continue to ref('filter_table').
The reason for this distinction is that dbt model names must be unique (within a dbt package/project), but aliases need not be unique (if they are materialized to different schemas).

You might be able to take advantage of dbt Classing - check out api.Relation, in which the identifier could be set as the alias I believe...

Related

DBT Test configuration for particular scenario

Hello Could anyone help me how to simulate this scenario. Example I want to validate these 3 fields on my table "symbol_type", "symbol_subtype", "taker_symbol" and return unique combination/result.
I tried to use this command, however Its not working properly on my test. Not sure if this is the correct syntax to simulate my scenario. Your response is highly appreciated.
Expected Result: These 3 fields should return my unique combination using DBT commands.
I'd recommend to either:
use the generate_surrogate_key (docs) macro in the model, or
use the dbt_utils.unique_combination_of_columns (docs) generic test.
For the first case, you would need to define the following in the model:
select
{{- dbt_utils.generate_surrogate_key(['symbol_type', 'symbol_subtype', 'taker_symbol']) }} as hashed_key_,
(...)
from your_model
This would create a hashed value of the three columns. You could then use a unique test in your YAML file.
For the second case, you would only need to add the generic test in your YAML file as follows:
# your model's YAML file
- name: your_model_name
description: ""
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- symbol_type
- symbol_subtype
- taker_symbol
Both these approaches will let you check whether the combination of the three columns is unique over the whole model's output.

TypeORM View Entity synchronization (creation) order problems

Using TypeORM, I'm trying to create ViewEntities that depend on each other, for example "View B" select from "View A". No matter what I do I can't get the ViewEntities to get created in the order of dependency. Sometimes "View B" is created first, and the synchronization process fails, because it can't find "View A", since it's not created yet.
The error:
QueryFailedError: relation "public.course_item_view" does not exist
Solutions I have tried:
Renaming the ViewEntity files (to check if the system uses ABC ordering on file names)
Renaming the ViewEntity classes (to check if the system uses ABC ordering on class names)
Renaming the ViewEntity's "name" property (to check if the system uses ABC ordering on the final SQL view names)
Reordering the ViewEntity class references in the "entities: []" array of the connection options
Reordering the ViewEntity class imports in the file where I declare the connection options
Removing/Adding the file again (to check if the system uses Creation Date based ordering)
Modifying the files (to check if the system uses Modification Date based ordering)
All of these failed. I cannot figure out how the system determines the order in which the view's are created.
Any help would be GREATLY appreciated!!
Expected Behavior
The view's should be created in an order that is either specified by a property inside the views, or the order should be resolved automatically from the SELECT statements (dependency array), or it should be based on the order in which I reference the ViewEntities in the "entities: []" array of the connection options, or any other solution would be perfect where one could determine the order in which the ViewEntities are created.
Actual Behavior
The ViewEntites are created in an order that I honestly can't understand. Sometimes a dependent ViewEntity is created before the ViewEntitiy it depends on. This causes the synchronization to fail.
File name: "CourseItemView" which resolves to: "course_item_view"
#ViewEntity({
expression: `
SELECT
"uvcv"."userId",
"uvcv"."courseId",
"uvcv"."videoId",
CAST (null AS integer) AS "examId",
"uvcv"."isComplete" AS "isComplete"
FROM public.video_completed_view AS "uvcv"
UNION ALL
SELECT
"uecv"."userId",
"uecv"."courseId",
CAST (null AS integer) AS "videoId",
"uecv"."examId",
"uecv"."isCompleted" AS "isComplete"
FROM public.user_exam_completed_view AS "uecv"
.
.
File name: "CourseItemStateView" which resolves to: "course_item_state_view"
This DEPENDS on the "course_item_view", as you can see in the SQL
#ViewEntity({
expression: `
SELECT
"course"."id" AS "courseId",
"user"."id" AS "userId",
"civ"."videoId" AS "videoId",
"civ"."isComplete" AS "isVideoCompleted",
"civ"."examId" AS "examId",
"civ"."isComplete" AS "isExamCompleted"
FROM public."course"
LEFT JOIN public."user"
ON 1 = 1
LEFT JOIN public.course_item_view AS "civ" ------------------- HERE
ON "civ"."courseId" = "course"."id"
AND "civ"."userId" = "user"."id"
ORDER BY "civ"."videoId","civ"."examId"
`
})
.
.
My connection options:
const postgresOptions = {
// properties, passwords etc...
entities: [
// entities....
// ...
// ...
// views
VideoCompletedView,
UserExamCompletedView,
UserExamAnswerSessionView,
UserVideoMaxWatchedSecondsView,
CourseItemView, --------------------------------HERE
CourseItemStateView ---------------------------HERE
],
} as ConnectionOptions;
createConnection(postgresOptions )
Steps to Reproduce
Create ViewEntites that depend on each other
You will run into this issue, but is hard to say exactly why and when, this is the main problem.

Referring to the source table name as a variable from models in dbt

I have declared a variable inside dbt_project.yml as
vars:
deva: CEMD_RAW
and i am using this inside my models file as
select *
from {{ source("{{ var('deva')}}", 'table_name') }}
but when i compile the file it says source named '{{ var('deva') }}.table_name' was not found. What is the correct way to refer the variable.
Once you're in your Jinja 'double curlies', you don't need to nest another set off curlies in it, aka Don't Nest Your Curlies.
The correct way to do this would be:
select *
from {{ source(var('deva'), 'table_name') }}

How to map two variables from SQL Query - MyBatis

So, I have this select query below, which joins two tables and retrieves a String:
<select id =“getAppVerByConfId” parameterType=“java.lang.String” resultType=“java.lang.String”>
SELECT t.app_ver
FROM
application a
JOIN transaction t on t.txn_id = a.txn_id
WHERE
a.confirmation_id = #{0}
</select>
and then I used that as a template to write a 2nd query, which is nearly identical, but just retrieves a different parameter from the table.
<select id =“getStepNameByConfId” parameterType=“java.lang.String” resultType=“java.lang.String”>
SELECT t.step_name
FROM
application a
JOIN transaction t on t.txn_id = a.txn_id
WHERE
a.confirmation_id = #{0}
Both of these work fine on their own, and they're used at the same point in the program. But there's got to be a better way than this surely? I should be able to make the query once, and then map the results to what I want, correct? Should I make a resultset, and then be able to pull them out? Maybe as a HashMap and I can retrieve the values by keys? Is this a situation where I can USE the AS operator? i.e. "SELECT t.app_ver AS appVersion"? My thinking is that's for passing variables into the query, though, and not for getting them out?
If there's any thoughts on this I would love to hear them. I'm basically trying to combine these into one query, and I need to be able to retrieve the right value and not assign app_ver to step_name or vice versa.
Cheers
As you say it's not a bad idea use alias (t.app_ver as appVersion) in your select but it is just the name of the column which will be mapped. So in the case you use alias as next t.app_ver as appVersion, t.step_name as stepName your column names will be appVersion and stepName.
Then, to map your result you have multiple choices the idea to map it in a map structure is not a bad idea and it's easy, you just need to put your result type as hashmap, it will be something like that (and it's not need any Resultmap):
Hashmap
(Example in offical page)
<select id="selectPerson" parameterType="int" resultType="hashmap">
SELECT * FROM PERSON WHERE ID = #{id}
</select>
The column will be the key and the row values the value in the map.
keyed by column names mapped to row values
So to get your values you will need to use the column name as key in the map:
String appVersionValue = map.get("appVersion");
Resultmap
Other way it to create a class with the properties you need to map and then create your resultmap.
A resultmap is defined as next:
resultMap – The most complicated and powerful element that describes
how to load your objects from the database result sets.
Your class would be like:
public class Application{
private String appVersion;
private String stepName;
//.... getters and setters
}
And your result map would map the column name with the class properties specifying the type with the class created for this (In this case is Application):
<resultMap id="applicationResultMap" type="Application">
<result property="appVersion" column="appVersion"/>
<result property="stepName" column="stepName"/>
</resultMap>
(Be careful, because in this example the columns and properties are called equal, there are cases where the column is called app_version and the property appVersion for example so there would be <result property="appVersion" column="app_version"/>
Finally in your select you specify to use this resultmap:
<select id="selectMethodName" resultMap="applicationResultMap">
select t.app_ver as appVersion, t.step_name as stepName
from your_table
</select>

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.