incremental load with bigquery isn't working with dbt unique_key - google-bigquery

I'm trying to build an incremental model using dbt on BigQuery and it seems like I'm missing some detail on my code or in the way how incremental models works.
I am building a unique hash of my five functional keys which are POSStoreCode, StoreSystemType, TargetStartDate, TargetEndDate and TargetSalesType
My incremental model looks like this.
{{ config(materialized='incremental', alias='Sales_Target_Oblique_v1', schema='bq_ed_harmonized_'+var('env'), unique_key='Checksum') }}
with salestarget as (
select
store_code as POSStoreCode,
store_name as StoreName,
'SID' as StoreSystemType,
target_qty as TargetQuantity,
target_start_date as TargetStartDate,
target_end_date as TargetEndDate,
concat('Women ', ProductCategory) as TargetSalesType,
cast(NULL as float64) as TargetAmount,
'{{ var("transaction_id") }}' as ProcessingId,
CURRENT_DATETIME() as CreationDatetime,
CURRENT_DATETIME() as LastUpdateDatetime
from {{ ref('salestarget') }} ),
harmonized_salestarget as (
select
* ,
{{ dbt_utils.surrogate_key(['POSStoreCode', 'StoreName', 'StoreSystemType', 'TargetQuantity', 'TargetStartDate', 'TargetEndDate', 'TargetSalesType', 'TargetAmount']) }} as Checksum
from salestarget
)
select * from harmonized_salestarget
{% if is_incremental() %}
where Checksum != {{ dbt_utils.surrogate_key(['POSStoreCode', 'StoreName', 'StoreSystemType', 'TargetQuantity', 'TargetStartDate', 'TargetEndDate', 'TargetSalesType', 'TargetAmount']) }}
{% endif %}
When I first execute the the model, it works well by creating the table Sales_Target_Oblique_v1 and loading the data there, in addition when I re execute it again it doesn't merge anything which is normal.
Now that I'm updating the table source {{ ref('salestarget') }} manually on some values for example setting a new value for the TargetQuantity and when I execute the model again, it is supposed to detect a change happened and merge it since the new calculated checksum will be different from the old one, but it doesn't work, it merges 0 rows at the end.
Is there any problem with my checksum ?

To add data to a dbt incremental model, you need to return only the new rows you would like appended (or updated) to the existing table.
In your case, you are returning no new rows because you are comparing the checksum of the query to itself (because you're selecting from harmonized_salestarget). Effectively you're saying where 1 != 1.
Your checksum definition at the top of the file is correct - what you want to do is change the bottom part of your query to find records that have changed since the last run of the model. Do you have a modified date column you can filter on for example?
These docs will help get you on the right track: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models

Related

Filtering for new rows using dbt incremental model

I am new to dbt and SQL in general. I am building an incremental model in dbt that gets run daily. I have a table on snowflake for example like this.
Nat_K1
Nat_k2
Product
Supplier
Component
Metadata
6231
~~
Toy Car
ToysRus
Base
{Hash:2bbd4cb604298556a40f16c50218b226, Load_time: 01:01:2023}
6231
~~
Toy Car
ToysRus
Wheels
{Hash:fd827e6fe9105024dfc5e58d21cde9bd, Load_time: 01:01:2023}
6231
~~
Toy Car
ToysRus
Remote
{Hash:6dfddb68e3219fa66af182f19b1c2ddf, Load_time: 01:01:2023}
I have a view that gets data from my source. The incremental model is a ctas using this view. I want to make sure no duplicates are being added. How do I make dbt check the hash values already in the table before inserting. The hash is made up of Product, Supplier and Component and makes up the Metadata variant column.
For example if I do dbt run the next day. This row would be added. Which I want to avoid.
Nat_K1
Nat_k2
Product
Supplier
Component
Metadata
6231
~~
Toy Car
ToysRus
Base
{Hash:2bbd4cb604298556a40f16c50218b226, Load_time: 02:01:2023}
I am unsure of the best approach to handle this.
Incremental models will automatically update records based on the unique_key. However, if you want to keep the timestamp, you have to get a little more creative.
Option 1: Join with self
{{
config({
"materialized": 'incremental',
"unique_key": 'HASH_FIELD',
})
}}
SELECT
S.HASH_FIELD,
S.NAT_K1,
S.NAT_K2,
S.PRODUCT,
S.SUPPLIER,
S.COMPONENT,
{% if is_incremental() %}
IFNULL(T.LOAD_TIME, CURRENT_TIMESTAMP()) LOAD_TIME
{% else %}
CURRENT_TIMESTAMP() LOAD_TIME
{% endif %}
FROM {{ ref('source') }} S
{% if is_incremental() %}
LEFT JOIN {{ this }} T ON S.HASH_FIELD = T.HASH_FIELD
{% endif %}
Option 2: Use not exists
{{
config({
"materialized": 'incremental',
"unique_key": 'HASH_FIELD',
})
}}
SELECT
S.HASH_FIELD,
S.NAT_K1,
S.NAT_K2,
S.PRODUCT,
S.SUPPLIER,
S.COMPONENT,
CURRENT_TIMESTAMP() LOAD_TIME
FROM {{ ref('source') }} S
{% if is_incremental() %}
WHERE NOT EXISTS (SELECT 1 FROM {{ this }} WHERE HASH_FIELD = S.HASH_FIELD)
{% endif %}
Use option 1 if you need to change other fields that may have updated but you don't want to change the load time. Use option 2 if you want to only load new records.
EDIT: I assume the timestamp you're looking to use is not available in the source table. However, if it is, then you just want to use the common strategy for incremental models which is to look for the last timestamp of the load in the source table. Something like this:
SELECT ...
FROM {{ ref('source') }} S
{% if is_incremental() %}
WHERE S.LOAD_TIME > (SELECT MAX(LOAD_TIME) FROM {{ this }})
{% endif %}

Show field based on value of another field

I've been trying to show order confirmation date in sale order tree view.
To show it I used 'date_order' like this:
<field name="date_order" string="Confirmation Date" optional="show" attrs="{'invisible':[['state','not in',['sale', 'done']]]}" />
In our setup orders are created manually but also synced from Woocommerce and in those synced orders from web shop date_order is always before the create date.
For example order is created (create_date) 10.08.2022 17:10:20 and confirmed automatically (date_order) 10.08.2022 17:10:11
Somehow, it is confirmed 9s before it is created. Now, i'd like to show date_order only when it's value is > create_date. In other cases i'd like to display create_date in it's place.
Tryed to use attrs for this, I'm not sure if it's even possible:
<field name="date_order" string="Confirmation Date" optional="show" attrs="{'invisible':['&', ['state','not in',['sale', 'done']], ['date_order'], '>', ['create_date']]}" />
The code above gives XMLSyntaxError. Not sure if it's possible to compare values in that manner - and finally how to get one or the other value - I guess my second approach is maybe better.
In second approach I tried to create compute field, like this:
date_order_mine = fields.Char("Potvrdjeeeno", compute="comp_date")
#api.depends('create_date', 'date_order', 'order_line')
def comp_date(self):
for order in self:
for line in order.order_line:
if line.create_date < line.date_order:
return line.date_order
else:
return line.create_date
This code gives me AttributeError: 'sale.order.line' object has no attribute 'date_order'
Since I'm so new to Odoo and Python dev I'm not sure what should I do here to compare values of this fields and to return one or other based on conditions - if someone can help I will appreciate it.
The domain used in attrs is not valid, domains should be a list of criteria, each criterion being a triple (either a list or a tuple) of:
(field_name, operator, value)
In your second approach, the date_order field is on the parent model sale.order (order) and to access the date order from order lines , use the order_id field like following:
line.order_id.date_order
To set the value of date_order_mine to create_date or date_order you do not need the order lines and also the computed method should assign the compute value:
Computed Fields
Fields can be computed (instead of read straight from the database) using the compute parameter. It must assign the computed value to the field.
Example:
date_order_mine = fields.Datetime("Potvrdjeeeno", compute="comp_date")
#api.depends('create_date', 'date_order')
def comp_date(self):
for order in self:
if order.create_date < order.date_order:
order.date_order_mine = order.date_order
else:
order.date_order_mine = order.create_date

dbt to handle the same query multiple if-else conditions

One of my very first dbt questions as I am also new to this framework.
I have the following query, and have a question what's the best approach to write this better?
I used var('partner') and with uuid to use for status column, and also the temp table name 'points_{{partner}}'
Question :
What's the best way to structure and organize this to handle said 20+ partners instead of having a duplicate of the same query on 20 SQL files? I am using 'points_{{partner}}' here, but ultimately I want to have many of these partner specific views store in the destination.
Can I just put partner in the schema.yml or other dbt files, that I can just load and reference it? Any example on how to do this? In this way, I was thinking about not needing to have multiple if-else statements with 20+ partners to handle, and instead it is just simply {{partner}}
WITH 'points_{{partner}}' AS (
SELECT
TO_CHAR(
TO_DATE(points_timestamp, 'YYYY-MM-DD'),
'YYYY-MM'
) AS "months",
SUM(points_amount) AS "points_amount",
CASE
{% if var('partner') == 'nike' %}
WHEN uuid = '00000000-d64b-46ea-8454-428279b15064' THEN 'OK'
WHEN uuid = '11111111-dc9a-493a-b1c0-6a798a4889ac' THEN 'NOT_OK'
{% elif var('partner') == 'puma' %}
WHEN uuid = '22222222-9644-4c6f-bcb6-57ae8401dfc0' THEN 'OK'
WHEN uuid = '33333333-af79-4364-8b26-c8106627c937' THEN 'NOT_OK'
{% endif %}
END AS "status"
FROM
dbt.raw_points
WHERE
{% if var('partner') == 'nike' %}
partner_uuid = '88888888-cfd3-47f4-b6da-447401aefbae'
{% elif var('partner') == 'puma' %}
partner_uuid = '99999999-f345-43e8-a335-a1268969095e'
{% endif %}
GROUP BY
months,
points_amount,
status
ORDER BY
months DESC
)
SELECT * FROM 'points_{{partner}}'
Right now you are doing an single cte that templates all partners into that single object.
Another alternative would be a template cte that union each partner together.
(This example assumes that you have each raw_points table within a unique schema per partner - a soft single tenancy model for the warehouse)
Example:
{% set partners = get_column_values(table=ref('my_distinct_partners'),
column='partner_name', max_records=50, filter_column='partner_status', filter_value='OK') %}
{% if partners != '' %}
with partner_group_points as (
{% for partner in partners %}
SELECT
TO_CHAR( TO_DATE(points_timestamp, 'YYYY-MM-DD'), 'YYYY-MM') AS "months",
'{{partner}}' as partner,
SUM(points_amount) AS "points_amount",
<custom calculation for status here> as status
FROM
`{{ target.project }}.platform_data_{{partner}}.raw_points`
GROUP BY
months,
points_amount,
status
ORDER BY months DESC
{% if not loop.last %} UNION ALL {% endif %}
{% endfor %}
)
SELECT
months,
partner,
sum(points_amount) as points_amount,
status
FROM partner_group_points
GROUP BY months,partner,status
{% endif %}
I'm completely making this up since I don't know your exact source data but you probably get the gist.

how to run distinct SELECT query inside laravel blade file inside of a foreach loop?

i have two tables
women-dress
women-dress_attributes
women-dress has price , color and women-dress_attributes has display_name , attribute_name and filter
filter is a simple Y or N value
a row in women-dress_attributes looks like
Price price Y
since price has a filter value of Y i need to load distinct price values from women-dress
a part of my controller looks like this
$Filter_query = DB::table($attribute_table_name)
->select('display_name','attribute')
->where('filter', '=', 'Y')
->get();
and inside my blade file i have
#foreach($Filter_query as $agm)
<h4>{{ $agm->display_name }}</h4>
#endforeach
now how can i run another sql select distinct query to load the corresponding value inside the blade file?
or is there any better practice?
The proper way to do this in laravel is to create a model for each of your 2 tables, and create a relationship fucntion (i.e.dress()) between them.
Then you will be able to call in your blade this $agm->dress()->price to get the price.
Read this for creating models
And this for creating relationships between your models

Django: best way to design one-to-many count viewing

I have 2 models A and B, and one A model may be referenced in several B modals:
class A(Model):
name = CharField(...)
class B(Model):
name = CharField(...)
a = ForeignKey(A, related_name='all_B')
In view of A model I want to show how many B objects there are.
For now I do this:
args={'a_all': A.objects.all()}
...
{% for a in a_all %}
{{a.name}} : {{ a.all_B.objects.count }} <br>
{% endofr %}
But, this will do SQL query for every A object, and it is not cool if I have many models in db tables.
So, I want to fetch all counts in only one query.
select_related in this case can't be used, becouse it works only for one-to-one nad many-to-one relations, but not for one-to-many.
Only thing thant comes to my head is to add counter field to A:
class A(Model):
name = CharField(...)
b_count = PositiveIntegerField(...)
And update it when I change relation. But it brings many work to detect all relations change if there are many views that add/delete/rewrite "a" field of "B" modal.
Try this:
a_all = A.objects.all().annotate(b_count =Count('b'))
This will add a new field b_count with every object of A.
Then in your template you can do something like
{% for a in a_all %}
{{a.name}} : {{ a.b_count }} <br>
{% endofr %}
try:
A.objects.all().annotate(b_count = Count('b'))
then for each instance of A, you can do a_instance.b_count