How to access model metadata from custom materialization

How to access model metadata from custom materialization - dbt

I am currently writing a custom dbt materialization and I would like to know what is the best way / pattern to access the "current" model metadata from the materialization itself.
Background
My model consists of two files:
sample_model.yaml (with the model metadata)
version: 2
models:
- name: sample_model
description: This is a test view to test a materialization
config:
schema: temp
materialized: custom_view
columns:
- name: custom_view_column_a
description: This is a test column A in a view
- name: custom_view_column_b
description: This is a test column B in a view
sample_model.sql (with the "actual" model)
SELECT
1 AS custom_view_column_a,
2 AS custom_view_column_b
My solution
In my custom materialization (custom_view) I would like, for example, to access the columns defined in the model metadata (sample_model.yaml). For the moment I could access them using the graph variable, in this way:
{% set models = [] %}
{% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") | selectattr("name", "equalto", this.identifier) %}
{% do models.append(node) %}
{% endfor %}
{% set model_metadata = models|first %}
{% set model_columns = model_metadata.get("columns") %}
Possible improvements
This approach works quite well, however it "feels" a bit like (ab)using a sort of "global variable". Also the graph could become very large considering it stores both the metadata and the SQL of all the models in the project!
Is there any other (local) object / variable I can access from the materialization that only stores the metadata of the model it's currently being materialized?

{{ model }} gives you the data from the graph for the current model node. I think it should work inside a materialization:
{% set model_metadata = model %}
You may want to gate it with execute -- I'm not really sure if the first parsing pass templates the materialization code:
{% set model_metadata = model if execute else {} %}

Related

Configuration Change for Incremental Model on DBT

On one our previously created incremental models, I added partition_by and partition_expiration_days parameters to the configuration to set the table partition and retention in place.
{{ config(
materialized='incremental',
unique_key='record_id',
on_schema_change='append_new_columns',
partition_by={
"field": "row_ts",
"data_type": "timestamp",
"granularity": "day"
},
partition_expiration_days = 365
)
}}
I observed on the next run that the configuration didn't applied to the table.
It seems a full-refresh operation needed here. Yet we have strict retention on the data source for this table which some of the data would be lost with full-refresh operation.
Could anyone please let me know how this issue can be addressed with a solution?

How to efficiently retrieve a list of all collections a product belongs to in Shopify?

I want to create a CSV export of product data from a Shopify store. For each product I'm exporting data like the product name, price, image URL etc... In this export I also want to list, for each product, all the collections the product belongs to, preferably in the hierarchal order the collections appear in the site's navigation menu (e.g Men > Shirts > Red Shirts).
If my understanding of the API is correct, for each product I need to make a separate call to the Collect API to get a list of collections it belongs to then another call to the Collections API to get the handle of each collection. This sounds like a lot of API calls for each product.
Is there a more efficient way to do this?
Is there any way to figure out the aforementioned hierarchy of collections?

Unfortunately, as you pointed out, I don't think there is an efficient way of doing this because of the way that the Shopify API is structured. It does not permit collections to be queried from products, rather only products queried from collections. That is, one can't see what collections a product belongs to, but can see what products belong to a collection.
The ShopifyAPI::Collect or ShopifyAPI::Collection REST resource does not return Product variant information, which is needed to get the price information as per the requirements. Furthermore, ShopifyAPI::Collect is limited to custom collections only, and would not work for products in ShopifyAPI::SmartCollection's. For this reason I suggest using GraphQL instead of REST to get the information needed.
query ($collectionCursor: String, $productCursor: String){
collections(first: 1, after: $collectionCursor) {
edges {
cursor
node {
id
handle
products(first: 8, after: $productCursor){
edges{
cursor
node{
id
title
variants(first: 100){
edges{
node{
price
}
}
}
}
}
}
}
}
}
}
{
"collectionCursor": null,
"productCursor": null
}
The $productCursor variable can be used to iterate over all of the products in a collection and the $collectionCursor to iterate over all collections. Note that only the first 100 variants need to be queried since Shopify has a hard limit on 100 variants per product.
The same query can be used to iterate over ShopifyAPI::SmartCollection's.
Alternatively the same query using the REST API would look something like this in Ruby.
collections = ShopifyAPI::Collection.all # paginate
collection.each do |collection|
collection.products.each do |product|
product.title
# note the extra call the Product API to get varint info
ShopifyAPI::Product.find(product.id).variants.each do |varaint|
variant.price
end
end
end
I don't see any way to address the inefficiencies with the REST query, but you might be able to improve on the GraphQL queries by using Shopify's GraphQL Bulk Operations.

Do vue.js filters do anything that nested methods can't?

I'm wondering whether vue.js filters achieve something that nested methods could not. Offhand as a vue.js newbie it seems like extra syntax for no real purpose. Eg instead of this code using a "capitalize" function defined in filters:
{{ key | capitalize }}
I would just write this, and move the "capitalize" function into the "methods" section rather than "filters":
{{ capitalize(key) }}
Is there a use case where nested methods fall short, or are filters just syntax sugar? (If the latter, they are not sweet enough for my taste, but I hope this question can help develop my palette.)
This code came from the vue.js reference here https://v2.vuejs.org/v2/examples/grid-component.html

Filters are nothing but javascript functions as you mentioned.
So they can just be defined normally like functions inside methods .
I would recommend use of filters as it is good to separate out different logic in separate parts and leverage complete functionalities of a framework.
One use case is, when you have requirement of using multiple filters/manipulation, then they come handy as filters are chained internally.
filters: {
removespace: function(value) {
return value.replace(/\s/g, '')
},
lowercase: function(value) {
return value.toUpperCase()
},
}
<p>{{ message | lowercase | removespace }}</p>
Or a built in filter (events),
<input v-on="keyup:myFunction | key enter">
<!--myFunction will be called only when the enter key is pressed.-->

Where to write predefined queries in django?

I am working with a team of engineers, and this is my first Django project.
Since I have done SQL before, I chose to write the predefined queries that the front-end developers are supposed to use to build this page (result set paging, simple find etc.).
I just learned Django QuerySet, and I am ready to use it, but I do not know on which file/class to write them.
Should I write them as methods inside each class in models.py? Django documentation simply writes them in the shell, and I haven't read it say where to put them.

Generally, the Django pattern is that you will write your queries in your views in the views.py file. Here you will take each of your predefined queries for a given URL and return a response that renders a template (that presumably your front end team will build with you.) or returns a JSON response (for example through Django Rest Framework for an SPA front-end).
The tutorial is strong on this, so that may be a better bet for where to put things than the docs itself.
Queries can be run anywhere, but django is built to receive Requests through the URL schema, and return a response. This is typically done in the views.py, and each view is generally called by a line in the urls.py file.
If you're particularly interested in following the fat models approach and putting them there, then you might be interested in the Manager objects, which are what define querysets that you get through, for example MyModel.objects.all()
My example view (for a class based view, which provides information about a list of matches:
class MatchList(generics.ListCreateAPIView):
"""
Retrieve, update or delete a Match.
"""
queryset = Match.objects.all()
serializer_class = MatchSerialiser
That queryset could be anything, though.
A function based view with a different queryset would be:
def event(request, event_slug):
from .models import Event, Comment, Profile
event = Event.objects.get(event_url=event_slug)
future_events = Event.objects.filter(date__gt=event.date)
comments = Comment.objects.select_related('user').filter(event=event)
final_comments = []
return render(request, 'core/event.html', {"event": event, "future_events": future_events})
edit: That second example is quite old, and the query would be better refactored to:
future_events=Event.objects.filter(date__gt=event.date).select_related('comments')
Edit edit: It's worth pointing out, QuerySet isn't a language, in the way that you're using it. It's django's API for the Object Relational Mapper that sits on top of the database, in the same way that SQLAlchemy also does - in fact, you can swap out or use SQLAlchemy instead of using the Django ORM, if you really wanted. Mostly you'll hear people talking about the Django ORM. :)

If you have some model SomeModel and you wanted to access its objects via a raw SQL query you would do: SomeModel.objects.raw(raw_query).
For example: SomeModel.objects.raw('SELECT * FROM myapp_somemodel')
https://docs.djangoproject.com/en/1.11/topics/db/sql/#performing-raw-queries

Django file structure:
app/
models.py
views.py
urls.py
templates/
app/
my_template.html
In models.py
class MyModel(models.Model):
#field definition and relations
In views.py:
from .models import MyModel
def my_view():
my_model = MyModel.objects.all() #here you use the querysets
return render('my_template.html', {'my_model': my_model}) #pass the object to the template
In the urls.py
from .views import my_view
url(r'^myurl/$', my_view, name='my_view'), # here you write the url that points to your view
And finally in my_template.html
# display the data using django template
{% for obj in object_list %}
<p>{{ obj }}</p>
{% endfor %}

Django 1.6 ORM Joins

I'm trying to get all of the assets in a particular portfolio to display on a page.
I need to know: A). How to get that portfolios primary key, B). How to write a join in code, and C). If I'm even going about this in the right way (would another CBV or FBV be more appropriate or is the function get_assets() fine?).
Database setup:
class Portfolios(models.Model):
#code
class PortfoliosAssets(models.Model):
portfolio = models.ForeignKey(Portfolios)
asset = models.ForeignKey(Assets)
class Assets(models.Model):
#code
SQL I want to write with the ORM:
SELECT A.ticker
FROM assets A
INNER JOIN portfolios_assets PA ON PA.asset = A.id
WHERE PA.portfolio = --portfolio_pk
Code:
class ShowPortfolios(DetailView):
model = Portfolios
template_name = 'show_portfolios.html'
def get_assets(self):
#obviously not how to get the portfolios pk or columns from the ASSETS table.
assets = PortfoliosAssets.objects.get(portfolio=portfolio_pk)
for asset in assets:
#run some query to get each asset's info but this seems obviously wrong.

The relationship between Portfolio and Asset is a many-to-many. You should define that explicitly, and remove the PortfolioAsset model completely (Django will create an equivalent m2m join table for you).
class Portfolio(models.Model):
assets = models.ManyToManyField("Asset")
(Note the convention is to use singular names for models, not plurals.)
Once that's done, you don't need an extra method at all: you can simply access the asset from the portfolio via portfolio.assets.all(). Or, in the template:
{% for asset in portfolio.assets.all %}
{{ asset.ticker }}
{% endfor %}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to access model metadata from custom materialization - dbt

Related

Configuration Change for Incremental Model on DBT

How to efficiently retrieve a list of all collections a product belongs to in Shopify?

Do vue.js filters do anything that nested methods can't?

Where to write predefined queries in django?

Django 1.6 ORM Joins

Categories

Resources