I am trying to update a table in bigquery using DBT. The below command executes in bigquery;
Update {{ ref('my_table') }}
SET variable = 'value'
WHERE lower(variable) LIKE '%XX%' or lower(variable) like '%YY%'
However when I run it in DBT I get the following error
Server error: Database Error in rpc request (from remote system)
Syntax error: Expected end of input but got keyword LIMIT at [4:1]
Does anyone know why this is happening and how to resolve?
It's a little unintuitive at first I know but with dbt, every model is a select statement.
You should instead think of doing something like:
with cte as (
select * from {{ ref('my_table') }}
where <criteria>
)
select col1,
col2,
'value' as col3
from cte
Or possibly even simpler:
SELECT
'value' as variable
FROM {{ ref('my_table') }}
WHERE lower(variable) LIKE '%XX%' or lower(variable) like '%YY%'
Simply because during the dbt run cycle, the new values will be materialized into the new model.
However, if you are looking for ways to clean underlying tables in a DRY way, I'd highly recommend this thread Modeling SQL Update Statements from the dbt discourse for some patterns on managing statements which handle specific value cleaning. Example from Kyle Ries:
{% set mappings = {'something': 'boo', 'something-else': 'boo-else'} %}
with source as (
select * from {{ ref(‘stg_foobar’) }}
),
final as (
select
case
{% for old, new in mappings %}
when other_column like ‘{{old}}’ then ‘{{new}}’
{% endfor %}
end as column_name
from
source
)
select * from final
Related
The first time a run an incremental model in dbt is works just fine but the second time I run it I get this error:
Database Error in model my_incremental_model(models\my_incremental_model.sql)
operator does not exist: text || boolean
HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.
compiled SQL at target\run\dbt\models\my_incremental_model.sql
The table has columns bigint, string, boolean, and int. Any ideas? Here is the model
{{ config(
materialized = 'incremental',
unique_key = "col1||col2||col3||col4",
sort = ["col1", "col2", "col3", "col4"]
) }}
select distinct
col1
,col2
,col3
,col4
from
{{ source("src", "some_table") }}
dbt can build the composite by itself, we don't need to do it manually. You just need to replace your unique key definition by:
unique_key = ['col1', 'col2', "col3', 'col4']
the way you are creating the key manually might not be supported, it might be interesting to look at the erronous generated/compiled sql given in your error message:
...compiled SQL at target\run\dbt\models\my_incremental_model.sql..
Fond a solution. Had to cast the boolean to an integer
unique_key = "col1||col2||cast(col3 as integer)||col4",
I need Dynamic loop in dbt based on a column of the row
select id,loop_count,
{% set row_loop_cnt %}
loop_count
{% endset %}
{% for i in range(loop_count) %}
//creating a list
{% endfor %}
created_list as column_name
from table_name
I am getting 'str object cannot be interpreted as an integer' error
I tried multiple way of casting like
loop_count::int 'redshift'
loop_count | int 'Jinja'
But no luck could you please help me here
Macros are compiled (templated) before the query is run. That means that the data in your database doesn't run through the jinja templater. When you {% set row_loop_cnt = "loop_count"%} you're just passing a string with the value loop_count into jinja, not the data from the field with that name.
From your query, I assume that the table_name table contains a field called loop_count, and that field's data includes an integer that you would like to use to repeat a value in another column.
In most databases, you can do this with SQL, and not involve jinja at all. It's possible to use the run_query macro to pull data into the jinja context, but this is slow and error-prone, and not really applicable in a situation where each row of your data wants to reference a different value.
Assuming the simplest possible implementation of // creating a list, I would write this query as:
select id, loop_count, repeat(value_col || ',', loop_count) as created_list
from table_name
Hi I have a following table Sales
Country
Sales
Sale_Date
US
2
12-06-2022
JP
2
15-06-2022
I have to write a SQL query to update the paticular cell in dbt. I want to change Sale_Date for US.
Now my query is-
UPDATE `sales`
SET
Sale_Date = '2022-06-16'
WHERE
Country = 'US'
However, In dbt I get following error
Server error: Database Error in rpc request (from remote system)
Syntax error: Expected end of input but got keyword LIMIT at [4:1]
What am I missing? I am fairly new to dbt.
dbt is thought to mainly be used with SELECT statements. You can find a bit more context here.
What you might want to do is to just apply that transformation using a SELECT statement as follows:
select
[...],
case when country = 'US' then '2022-06-16'::date else sale_date end as <new_column_name>
from {{ ref('your_staging_model') }}. -- or {{ source('your_schema', 'your_source_table')}}
So now, you will have a dbt model materialized either as a view, a table, or whatever you prefer, that contains a column with the UPDATE-like transformation that you needed.
I am trying to make an sql file in dbt in order to update models with a new column
{{
config(materialized='table'
, retain_previous_version_flg=false
, migrate_data_over_flg=true
)
}}
CREATE OR REPLACE TABLE {{ ref ('my_table') }} (
SELECT *, new_columns_ts TIMESTAMP
);
Is there a way to use CREATE directly rather than having to use SELECT or WITH clause?
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword CREATE at [16:1]
In this particular case, you don’t need to use the statement CREATE OR REPLACE TABLE, to create a materialized table. You only need to write the SELECT statement.
There are no create or replace statements written in model statements.
This means that dbt does not offer methods for issuing CREATE TABLE
statements which can be used for source tables.
You can see this example.
{{
config(materialized='table'
, retain_previous_version_flg=false
, migrate_data_over_flg=true
)
}}
SELECT *, new_columns_ts TIMESTAMP from ‘dataset.table’
You can see this option using SQL.
CREATE OR REPLACE TABLE dataset.table (
SELECT *, new_columns_ts TIMESTAMP from ‘dataset.table’
);
I need to extract all records which have a field which does NOT have a unique value.
I can't figure out an elegant way to do it - using annotation or some other way. I see a "value_annotate" method to the object manager but it's unclear if it's at all related.
Currently I'm using the inelegant way of simple looping through all values and doing a get on the value, and if there's an exception it means it's not unique..
Thanks
I can't say much about the Django part, but the query would look something like:
SELECT *
FROM foo
WHERE id IN (
SELECT MAX(id)
FROM foo
GROUP BY bar
HAVING COUNT(*)=1)
This will return all records where the "bar" field is unique.
I'd go direct to a raw query in this case. This'll look something like the following, assuming you're using Django 1.2:
query = """
SELECT *
FROM table
GROUP BY field
HAVING COUNT(*) > 1
"""
non_uniques = Table.objects.raw(query)
For earlier than 1.2, see the django docs on raw queries