dbt + jinja Cannot cast column value to int - sql

I need Dynamic loop in dbt based on a column of the row
select id,loop_count,
{% set row_loop_cnt %}
loop_count
{% endset %}
{% for i in range(loop_count) %}
//creating a list
{% endfor %}
created_list as column_name
from table_name
I am getting 'str object cannot be interpreted as an integer' error
I tried multiple way of casting like
loop_count::int 'redshift'
loop_count | int 'Jinja'
But no luck could you please help me here

Macros are compiled (templated) before the query is run. That means that the data in your database doesn't run through the jinja templater. When you {% set row_loop_cnt = "loop_count"%} you're just passing a string with the value loop_count into jinja, not the data from the field with that name.
From your query, I assume that the table_name table contains a field called loop_count, and that field's data includes an integer that you would like to use to repeat a value in another column.
In most databases, you can do this with SQL, and not involve jinja at all. It's possible to use the run_query macro to pull data into the jinja context, but this is slow and error-prone, and not really applicable in a situation where each row of your data wants to reference a different value.
Assuming the simplest possible implementation of // creating a list, I would write this query as:
select id, loop_count, repeat(value_col || ',', loop_count) as created_list
from table_name

Related

dbt Redshift Database Error in model , operator does not exist: text || boolean

The first time a run an incremental model in dbt is works just fine but the second time I run it I get this error:
Database Error in model my_incremental_model(models\my_incremental_model.sql)
operator does not exist: text || boolean
HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.
compiled SQL at target\run\dbt\models\my_incremental_model.sql
The table has columns bigint, string, boolean, and int. Any ideas? Here is the model
{{ config(
materialized = 'incremental',
unique_key = "col1||col2||col3||col4",
sort = ["col1", "col2", "col3", "col4"]
) }}
select distinct
col1
,col2
,col3
,col4
from
{{ source("src", "some_table") }}
dbt can build the composite by itself, we don't need to do it manually. You just need to replace your unique key definition by:
unique_key = ['col1', 'col2', "col3', 'col4']
the way you are creating the key manually might not be supported, it might be interesting to look at the erronous generated/compiled sql given in your error message:
...compiled SQL at target\run\dbt\models\my_incremental_model.sql..
Fond a solution. Had to cast the boolean to an integer
unique_key = "col1||col2||cast(col3 as integer)||col4",

Updating a table in bigquery using DBT

I am trying to update a table in bigquery using DBT. The below command executes in bigquery;
Update {{ ref('my_table') }}
SET variable = 'value'
WHERE lower(variable) LIKE '%XX%' or lower(variable) like '%YY%'
However when I run it in DBT I get the following error
Server error: Database Error in rpc request (from remote system)
Syntax error: Expected end of input but got keyword LIMIT at [4:1]
Does anyone know why this is happening and how to resolve?
It's a little unintuitive at first I know but with dbt, every model is a select statement.
You should instead think of doing something like:
with cte as (
select * from {{ ref('my_table') }}
where <criteria>
)
select col1,
col2,
'value' as col3
from cte
Or possibly even simpler:
SELECT
'value' as variable
FROM {{ ref('my_table') }}
WHERE lower(variable) LIKE '%XX%' or lower(variable) like '%YY%'
Simply because during the dbt run cycle, the new values will be materialized into the new model.
However, if you are looking for ways to clean underlying tables in a DRY way, I'd highly recommend this thread Modeling SQL Update Statements from the dbt discourse for some patterns on managing statements which handle specific value cleaning. Example from Kyle Ries:
{% set mappings = {'something': 'boo', 'something-else': 'boo-else'} %}
with source as (
select * from {{ ref(‘stg_foobar’) }}
),
final as (
select
case
{% for old, new in mappings %}
when other_column like ‘{{old}}’ then ‘{{new}}’
{% endfor %}
end as column_name
from
source
)
select * from final

Conditionally replace single value per row in jsonb column

I need a more efficient way to update rows of a single table in Postgres 9.5.
I am currently doing it with pg_dump, and re-import with updated values after search and replace operations in a Linux OS environment.
table_a has 300000 rows with 2 columns: id bigint and json_col jsonb.
json_col has about 30 keys: "C1" to "C30" like in this example:
Table_A
id,json_col
1 {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Tokyo", ... "C30":"Dallas"}
2 {"C1":"Dublin","C2":"Berlin","C3":"Kiev","C4":"Tokyo", ... "C30":"Phoenix"}
3 {"C1":"Paris","C2":"London","C3":"Berlin","C4":"Ankara", ... "C30":"Madrid"}
...
The requirement is to mass search all keys from C1 to C30 then look in
them for the value "Berlin" and replace with "Madrid" and only if
Madrid is not repeated. i.e. id:1 with Key C3, and id:2 with C2. id:3
will be skipped because C30 exists with this value already
It has to be in a single SQL command in PostgreSQL 9.5, one time and considering all keys from the jsonb column.
The fastest and simplest way is to modify the column as text:
update table_a
set json_col = replace(json_col::text, '"Berlin"', '"Madrid"')::jsonb
where json_col::text like '%"Berlin"%'
and json_col::text not like '%"Madrid"%'
It's a practical choice. The above query is rather a find-and-replace operation (like in a text editor) than a modification of objects attributes. The second option is more complicated and surely much more expensive. Even using the fast Javascript engine (example below) more formal solution would be many times slower.
You can try Postgres Javascript:
create extension if not exists plv8;
create or replace function replace_item(data jsonb, from_str text, to_str text)
returns jsonb language plv8 as $$
var found = 0;
Object.keys(data).forEach(function(key) {
if (data[key] == to_str) {
found = 1;
}
})
if (found == 0) {
Object.keys(data).forEach(function(key) {
if (data[key] == from_str) {
data[key] = to_str;
}
})
}
return data;
$$;
update table_a
set json_col = replace_item(json_col, 'Berlin', 'Madrid');
What makes this hard is that you are looking for unknown keys holding values of interest. Postgres infrastructure is optimized to find keys (or array values).
Possibly caused by a sub-optimal table design. The many top-level objects of your jsonb column might be replaced by an array, discarding irrelevant key names altogether. (Or maybe another array for key names.) Or, ideally with a full normalized DB schema to begin with.
Be that as it may, here is a proof of concept, how this can be fast and clean with stock Postgres 9.5 or later anyway.
Additional difficulty 1: it's unknown whether duplicate values are possible.
Additional difficulty 2: value frequencies are unknown, too.
Additional difficulty 3: only the first value found is to be replaced and only if the target value is not there yet. Implementing this with set-based operations is possible, but unwieldy. I wrote a plpgsql function instead:
CREATE OR REPLACE FUNCTION jsonb_replace_value(_j jsonb, _old jsonb, _new jsonb)
RETURNS jsonb AS
$func$
DECLARE
_key text;
_val jsonb;
BEGIN
FOR _key, _val IN
SELECT * FROM jsonb_each(_j)
LOOP
IF _val = _old THEN
RETURN jsonb_set(_j, ARRAY[_key], _new); -- update 1st key
END IF;
END LOOP;
RETURN _j; -- nothing found, return original
END
$func$ LANGUAGE plpgsql IMMUTABLE;
COMMENT ON FUNCTION jsonb_replace_value(jsonb, jsonb, jsonb) IS '
Replace the first occurrence of _old value with _new.
Call:
SELECT jsonb_replace_value('{"C1":"Paris","C3":"Berlin","C4":"Berlin"}', '"Berlin"', '"Madrid"')';
Could be enhanced to optionally replace all occurrences etc. but that's beyond the scope of this question.
Now this would be simple:
UPDATE table_a
SET json_col = jsonb_replace_value(json_col, '"Berlin"', '"Madrid"'); -- note jsonb literal syntax!
If all rows need an update, we can stop here. Won't get faster. (Except possibly with alternatives like demonstrated by #klin.)
If a large percentage of all rows need an update, add a WHERE condition to avoid empty updates:
...
WHERE json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"');
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Typically, only very few rows actually need an update. Then iterating through all rows with above query is expensive. We need index support to make it fast. Not easy for the case. I suggest an expression index based on an IMMUTABLE function extracting the array of values:
CREATE OR REPLACE FUNCTION jsonb_object_val_arr(jsonb)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY (SELECT value FROM jsonb_each_text($1))';
COMMENT ON FUNCTION jsonb_object_val_arr(jsonb) IS '
Generates text array of values in outermost jsonb object.
Of limited use if there can be nested objects.';
CREATE INDEX table_a_val_arr_idx ON table_a USING gin (jsonb_object_val_arr(json_col));
Related, with more explanation:
Find rows containing a key in a JSONB array of records
Query making use of this index:
UPDATE table_a a
SET json_col = jsonb_replace_value(a.json_col, '"Berlin"', '"Madrid"')
WHERE jsonb_object_val_arr(json_col) #> '{Berlin}' -- has Berlin, possibly > 1x ..
-- AND NOT jsonb_object_val_arr(json_col) #> '{Madrid}'
AND NOT EXISTS ( -- .. but not Madrid
SELECT FROM table_a b
WHERE jsonb_object_val_arr(json_col) #> '{Madrid}' -- note array literal syntax
AND b.id = a.id
);
The NOT EXISTS semi-anti-join is carefully drafted to utilize the index a 2nd time.
The commented simpler alternative is faster if there are few rows with 'Berlin' and 'Madrid' - then a filter step in the query plan will be cheaper.
Should be very fast.
db<>fiddle here for Postgres 9.5 demonstrating all.
Ok i have tested all methods and i can say you did a great job
This helped me a lot. Let me share my feedback with you.
Method 1 sugested by Klin. Works perfect and is totally fine, except if
key is named like value, then both will be replaced key and value.
i.e.: "Berlin":"Berlin" becomes "Madrid":"Madrid"
Method 2 with plv8 extension did not worked because i am missing controll file
i had to install it and i just skipped this method, so i have no
feedback regarding this method.
Error that i was getting was this:
ERROR: could not open extension control file
"/usr/pgsql-9.5/share/extension/plv8.control": No such file or directory
Method 3 similar to method 2 with jsonb_replace_value function
works perfect, in replaces rows that contains specific value regardless
of the key. And adding condition
WHERE json_col <> jsonb_replace_value(json_col, '"Berlin"', '"Madrid"')
will avoid empty updates and will skip rows than do not need to be updated
And somethig like this
{"Berlin":"Berlin"} becomes {"Berlin":"Madrid"} i.e. Key is not touched, just value
Method 4 is a little more complicated, it uses Method 3 and Indexes
It works totally awesome and super speedy.
And NOT EXISTS semi-anti-join indeed forced to use Index again.
I was shocked how fast it performed!!!
However i discovered all this methods will work if json string looks like this:
{"key":"value"}
If i have for example to update a value that is a json object it will not update
something like this: {"C30":{"id":10044,"value":"Berlin","created_by":"John Doe"}}
MANY THANKS to you guys. #klin and #erwin-brandstetter. This helped me to learn something new!

Conditionally delete item inside an Array Field PostgreSQL

I'm building a kind of dictionary app and I have a table for storing words like below:
id | surface_form | examples
-----------------------------------------------------------------------
1 | sounds | {"It sounds as though you really do believe that",
| | "A different bell begins to sound midnight"}
Where surface_form is of type CHARACTER VARYING and examples is an array field of CHARACTER VARYING
Since the examples are generated automatically from another API, it might not contain the exact "surface_form". Now I want to keep in examples only sentences that contain the exact surface_form. For instance, in the given example, only the first sentence is kept as it contain sounds, the second should be omitted as it only contain sound.
The problem is I got stuck in how to write a query and/or plSQL stored procedure to update the examples column so that it only has the desired sentences.
This query skips unwanted array elements:
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id;
id | new_examples
----+----------------------------------------------------
1 | {"It sounds as though you really do believe that"}
(1 row)
Use it in update:
with corrected as (
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id
)
update a_table
set examples = new_examples
from corrected
where examples <> new_examples
and a_table.id = corrected.id;
Test it in rextester.
Maybe you have to change the table design. This is what PostgreSQL's documentation says about the use of arrays:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
Documentation:
https://www.postgresql.org/docs/current/static/arrays.html
The most compact solution (but not necessarily the fastest) is to write a function that you pass a regular expression and an array and which then returns a new array that only contains the items matching the regex.
create function get_matching(p_values text[], p_pattern text)
returns text[]
as
$$
declare
l_result text[] := '{}'; -- make sure it's not null
l_element text;
begin
foreach l_element in array p_values loop
-- adjust this condition to whatever you want
if l_element ~ p_pattern then
l_result := l_result || l_element;
end if;
end loop;
return l_result;
end;
$$
language plpgsql;
The if condition is only an example. You need to adjust that to whatever you exactly store in the surface_form column. Maybe you need to test on word boundaries for the regex or a simple instr() would do - your question is unclear about that.
Cleaning up the table then becomes as simple as:
update the_table
set examples = get_matching(examples, surface_form);
But the whole approach seems flawed to me. It would be a lot more efficient if you stored the examples in a properly normalized data model.
In SQL, you have to remember two things.
Tuple elements are immutable but rows are mutable via updates.
SQL is declarative, not procedural
So you cannot "conditionally" "delete" a value from an array. You have to think about the question differently. You have to create a new array following a specification. That specification can conditionally include values (using case statements). Then you can overwrite the tuple with the new array.
Looks like one way could to update the array with array elements that are valid by doing a select using like or some regular expression.
https://www.postgresql.org/docs/current/static/arrays.html
If you want to hold elements from array that have "surface_form" in it you have to use that entries with substring(....,...) is not null
First you unnest the array, hold only items that match, and then array_agg the stored items
Here is a little query you can run to test without any table.
SELECT
id,
surface_form,
(SELECT array_agg(examples_matching)
FROM unnest(surfaces.examples) AS examples_matching
WHERE substring(examples_matching, surfaces.surface_form) IS NOT NULL)
FROM
(SELECT
1 AS id,
'example' :: TEXT AS surface_form,
ARRAY ['example form', 'test test','second example form'] :: TEXT [] AS examples
) surfaces;
You can select data in temp table using
Then update temp table using update query on row number
Merge value using
This merge value you can update in original table
For Example
Suppose you create temp table
Temp (id int, element character varying)
Then update Temp table and nest it.
Finally update original table
Here is the query you can directly try to execute in editor
CREATE TEMP TABLE IF NOT EXISTS temp_element (
id bigint,
element character varying)WITH (OIDS);
TRUNCATE TABLE temp_element;
insert into temp_element select row_number() over (order by p),p from (
select unnest(ARRAY['It sounds as though you really do believe that',
'A different bell begins to sound midnight']) as P)t;
update temp_element set element = 'It sounds as though you really'
where element = 'It sounds as though you really do believe that';
--update table
select array_agg(r) from ( select element from temp_element)r

Oracle SQL "meta" query for records which have specific column values

I'd like to get all the records from a huge table where any of the number columns countains a value greater than 0. What's the best way to do it?
E.g.:
/* table structure*/
create table sometable (id number,
somestring varchar2(12),
some_amount_1 number(17,3),
some_amount_2 number(17,3),
some_amount_3 number(17,3),
...
some_amount_xxx number(17,3));
/* "xxx" > 100, and yeah I did not designed that table structure... */
And I want any row where any of the some_amount_n > 0 (even better solution is to add a field in the first place to show which field(s) are greater than zero).
I know I can write this with a huge some_amount_1 > 0 OR some_amount_2 > 0 OR ... block (and the field names with some case when but is there should be some more elegant solution, isn't there?
Possible solutions:
Normalize the table. You said you are not allowed to. Try to convince those that forbid such a change by explaining the benefits (performance, ease of writing queries, etc).
Write the huge ugly OR query. You could also print it along with the version of the query for the normalized tables. Add performance tests (you are allowed to create another test table or database, I hope.)
Write a program (either in PL/SQL or in another procedural language) that produces the horrible OR query. (Again, print along with the elegant version)
Add a new column, say called Any_x_bigger_than_zero which is automatically filled with either 0 or 1 via a trigger (that uses a huge ugly OR). Then you just need to check: WHERE Test_x_bigger_than_zero = 1 to see if any of the rows is > 0
Similar to previous but even better, create a materialized view with such a column.
First, create a table to sort the data into something more easily read from...something simple like id,column_name,column_value. You'll have to bear with me, been a while since I've operated in oracle, so this is heavy pseudo code at best:
Quick dynamic sql blurb...you can set a variable to a sql statement and then execute that variable. There are some security risks and it's possible this feature is disabled in your environment...so confirm you can run this first. Declare a variable, set the variable to 'select 1' and then use 'execute immediate' to execute the sql stored in your variable.
set var = 'select id, ''some_amount_' || 1 || '', some_amount || 1 || ' from table where some_amount_' || 1 || ' <> 0'
Assuming I've got my oracle syntax right...( pipe is append right? I believe a 3 single quote as ''' should result in one ' when in a variable too, you may have to trial and error this line until you have the var set to):
select id, 'some_amount_1',some_amount_1
from table
where some_amount_1 <> 0
This should select the ID and the value in some_amount_1 for each id in your database. You can turn this into an insert statement pretty easily.
I'm assuming some_amount_xxx has an upper limit...next trick is to loop this giant statement. Once again, horrible pseudo code:
declare sql_string
declare i and set to 1
for i = 1 to xxx (whatever your xxx is)
set sql_string to the first set var statement we made, replacing the '1' with the i var here.
execute sql
increment i
loop
Hopefully it makes sense...it's one of the very few scenarios you would ever want to loop dynamic sql on. Now you have a relatively straight forward table to read from and this should be a relatively easy query from here