Issues running insert into statement in dbt - dbt

I am new to dbt and I am trying to write a data model that would insert its result into an existing table on snowflake. The below code without the config runs smoothly on snowflake, but it doesn't work on dbt. Seems insert statements are not using in DBT?
{{ config(materialized = 'view') }}
INSERT INTO "v1" ("ID", "Value","Set",ROWNUM )
with LD as(
select "ID",
"Value",
"Set",
ROW_NUMBER()OVER ( PARTITION BY "ID" order by "Set" desc ) as rownum
from "Archive"."Prty" l
where l."Prty" = 'Log' AND "ID"= 111
),
LD2 as (
select "ID",
"Value",
"Set",
ROWNUM
from LD where ROWNUM = 1
)
SELECT * FROM LD2

You could use an incremental model to achieve this all in the same table. You can also use the qualify clause to remove the need for the second CTE. I am assuming ID should be unique, but refer to the link and modify if this is not the case.
{{
config(
materialized='incremental',
unique_key='ID'
)
}}
select
"ID",
"Value",
"Set"
from "Archive"."Prty" as l
where l."Prty" = 'Log' and "ID" = 111
qualify row_number() over (partition by "ID" order by "Set" desc) = 1
This will insert any rows that satisfy the above query and have an ID not already in the table. Otherwise, it will update any rows where the ID already does exist.

Related

Is there a way to query a specific data point if it exists, otherwise query everything else?

Say I have a table with a column called "Names" and with values "Mike", "John", "Kelly", and "Tina". Every day the values might change.
How would I structure the query so that if the table has the name "Tina", it only displays "Tina", but if it doesn't contain "Tina", it'll display everything else?
Another option to consider (BigQuery Standard SQL)
#standardSQL
SELECT * EXCEPT(flag) FROM (
SELECT *, names = 'Tina' OR COUNTIF(names = 'Tina') OVER() = 0 AS flag
FROM `project.dataset.table`
)
WHERE flag
I would expected this version is significantly cheaper than another one with implicit join
One option is union all ad not exists:
select name from mytable where name = 'Tina'
union all
select t.name from mytable t where not exists (select 1 from mytable t1 where t1.name = 'Tina')

For each selected row insert another?

I have a single table TableA. It has columns id, type, relatedId, another1, another2. Column type can have values 1, 2 or 3.
What I need is, for each row in TableA, where type = 1, insert another row in the same table and update the original row (column relatedId) with id of newly inserted row. Also, values for some columns in newly inserted row should be copied from the original one.
So for current state:
id|type|relatedId|another1
10| 1 |null|"some text"
11| 2 |null|"somthing"
12| 1 |null|"somthing else"
result should be following:
id|type|relatedId|another1
10| 1 |13 |"some text" - now has relationship to 13
11| 2 |null|"somthing"
12| 1 |14 |"somthing else" - now has relationship to 13
13| 3 |null|"some text" - inserted, "another1" is copied from 10
14| 3 |null|"somthing else" - inserted, "another1" is copied from 12
Assuming the texts are unique you can do this:
demo:db<>fiddle
WITH ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, null, another1
FROM tablea
WHERE type = 1
RETURNING id, another1
)
UPDATE tablea t
SET related_id = s.id
FROM (
SELECT * FROM ins
) s
WHERE s.another1 = t.another1 AND t.type = 1
The WITH clause allows to execute two separate statements sequentially. So first inserting the new data. With the new generated ids you can update the old data afterwards. Because you have to match the original data, the text is helpful as identifier.
This only works if you do not have to datasets with (1, 'something'). Then it would be hard to identify which of both records is the original for each copy.
Another way could be to store the type1-ids in the new type3-columns as well. If this would be ok for you, you could do this:
demo:db<>fiddle
WITH ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, id, another1
FROM tablea
WHERE type = 1
RETURNING id, related_id, another1
)
UPDATE tablea t
SET related_id = s.id
FROM (
SELECT * FROM ins
) s
WHERE s.related_id = t.id
This stores the original type1-ids in the related_id column of the new ones. So in every case the original id can be found over this value.
Unfortunately, you cannot NULL out these columns in another WITH clause because the WITH clauses only work with existing data. At this moment the query itself is not done yet. So the new records do not exist physically.
This one could work...
demo:db<>fiddle
WITH to_be_copied AS (
SELECT id, another1
FROM tablea
WHERE type = 1
), ins AS (
INSERT INTO tablea(type, related_id, another1)
SELECT 3, null, another1
FROM to_be_copied
ORDER BY id -- 1
RETURNING id, another1
)
UPDATE tablea t
SET related_id = s.type3_id
FROM (
SELECT
*
FROM
(SELECT id as type1_id, row_number() OVER (ORDER BY id) FROM to_be_copied) tbc
JOIN
(SELECT id as type3_id, row_number() OVER (ORDER BY id) FROM ins) i
ON tbc.row_number = i.row_number
) s
WHERE t.id = s.type1_id
This solution assumes that the given order at (1) ensures the inserting order of the new records. In fact, I am not quite sure about it. But if so: First all type1 records are queried. After that there are copied (in the same order!). After that the old and the new records ids are taken. The row_number() window function adds a consecutive row count to the records. So if both data sets have the same order, the old ids should get the same row number as their corresponding new ids. In that case an identification is possible. For the small example this works...
--> Edit: This seems to say: Yes, the order will be preserved since Postgres 9.6 https://stackoverflow.com/a/50822258/3984221
According to this question Postgres retains the order of row inserted via a SELECT with explicit ORDER BY as of 9.6. We can use this to connect the inserted rows with those they come from using row_number().
WITH
"cte1"
AS
(
SELECT "id",
3 "type",
"related_id",
"another1",
row_number() OVER (ORDER BY "id") "rn"
FROM "tablea"
WHERE "type" = 1
),
"cte2"
AS
(
INSERT INTO "tablea"
("type",
"another1")
SELECT "type",
"another1"
FROM "cte1"
ORDER BY "id"
RETURNING "id"
),
"cte3"
AS
(
SELECT "id",
row_number() OVER (ORDER BY "id") "rn"
FROM "cte2"
)
UPDATE "tablea"
SET "related_id" = "cte3"."id"
FROM "cte1"
INNER JOIN "cte3"
ON "cte3"."rn" = "cte1"."rn"
WHERE "cte1"."id" = "tablea"."id";
In the first CTE we get all the rows, that should be insert along with their row_number() ordered by their ID. In the second one we insert them by selecting from the first CTE explicitly ordering by the ID. We return the inserted ID in the second CTE, so that we can select it in the third CTE where we again add a row_number() ordered by the ID. We can now join the first and the third CTE via the row number to get pairs of original ID and newly inserted IDs. Base on that we can update the table setting the related IDs.
db<>fiddle

PostgreSQL. I need a hierarchical table to have a constraint so no node could have the same name on the same level

I have a hierarchical table with duplicate names on the same level. Example -
user (int id, string name, int parent_id)
1, Sam, null
2, Mike, 1
3, Mike, 1
4, Mike, 1
I need to make them like this
1, Sam, null
2, Mike#1, 1
3, Mike#2, 1
4, Mike#3, 1
And somehow add constraint. How can I do that?
You may use row_number() to generate those numbers in the sequence and COUNT analytic function to check whether or not to use sequence numbers
SELECT id,
CONCAT(name,
CASE
WHEN COUNT(*) OVER(
PARTITION BY name
) > 1 THEN --multiple names exist?
'#' || ROW_NUMBER() OVER(
PARTITION BY name
ORDER BY id )
END
) AS name, --else defaults to null (for single ones).
parent_id
FROM t
ORDER BY id;
It is not clear when you say "I need a hierarchical table", if you want to simply select them or create a new table. I would recommend you not to create another table simply to store mostly those same values, just create a VIEW instead using the above query as the base.
Demo
First I needed to rename duplicates on the same level. Even in root directory where parent_id is null. This code will do
update user user_update set name = name || '#' || (
select user_count_ids.number
from (
select user_row_count.id id, row_number() over (order by user_row_count.id) number
from user user_row_count
where user_update.name = user_row_count.name and user_update.parent_id is not distinct from user_row_count.parent_id
) as user_count_ids
where user_count_ids.id = org_update.id
)
where (
select count(*) > 1
from user user_count
where user_update.name = user_count.name and user_update.parent_id is not distinct from user_count.parent_id
);
Then I needed some sort of constraint. Thanks to http://stackoverflow.com/a/8289253/5292928 I added this code
create unique index unique_name_parentId_when_parentId_is_not_null
on user (name, parent_id)
where parent_id is not null;
create unique index unique_name_when_parentId_is_null
on user (name)
where parent_id is null;

How to Retrieve id of inserted row when using upsert with WITH cluase in Posgres 9.5?

I'm trying to do upset query in Postgres 9.5 using "WITH"
with s as (
select id
from products
where product_key = 'test123'
), i as (
insert into products (product_key, count_parts)
select 'test123', 33
where not exists (select 1 from s)
returning id
)
update products
set product_key='test123', count_parts=33
where id = (select id from s)
returning id
Apparently I'm retrieving the id only on the updates and get nothing on insertions even though I know insertions succeeded.
I need to modify this query in a way I'll be able the get the id both on insertions and updates.
Thanks!
It wasn't clear to me why you do at WITH first SELECT, but the reason you get only returning UPDATE id is because you're not selecting INSERT return.
As mentioned (and linked) in comments, Postgres 9.5 supports INSERT ON CONFLICT Clause which is a much cleaner way to use.
And some examples of before and after 9.5:
Before 9.5: common way using WITH
WITH u AS (
UPDATE products
SET product_key='test123', count_parts=33
WHERE product_key = 'test123'
RETURNING id
),i AS (
INSERT
INTO products ( product_key, count_parts )
SELECT 'test123', 33
WHERE NOT EXISTS( SELECT 1 FROM u )
RETURNING id
)
SELECT *
FROM ( SELECT id FROM u
UNION SELECT id FROM i
) r;
After 9.5: using INSERT .. ON CONFLICT
INSERT INTO products ( product_key, count_parts )
VALUES ( 'test123', 33 )
ON CONFLICT ( product_key ) DO
UPDATE
SET product_key='test123', count_parts=33
RETURNING id;
UPDATE:
As hinted in a comment there might be slight cons using INSERT .. ON CONFLICT way.
In case table using auto-increment and this query happens a lot, then WITH might be a better option.
See more: https://stackoverflow.com/a/39000072/1161463

Insert with returning a subquery

I am trying to insert a record in an m:n table (User-Group Relation) and return the group when the user successfully joined.
But I can't manage to return the whole group after the insert.
with "group" as (
SELECT * from "group" where code = 'tohubo' LIMIT 1
)
insert into group_users__user_groups ("group_users", "user_groups")
select id from "group", 1
returning (SELECT * from "group")
With that query I currently get the error message
subquery must return only one column
I also tried to just return *, but then I only get the content of group_users__user_groups.
I also tried to add an additional Select at the end:
with "found_group" as (
SELECT * from "group" where code = 'tohubo' LIMIT 1
)
insert into group_users__user_groups ("group_users", "user_groups")
select 1, id from "found_group";
Select * from "found_group";
But then the WITH part is not defined in the second query:
Kernel error: ERROR: relation "found_group" does not exist
The returning clause can only return data that was affected by the insert.
And you can only have one "final" statement in a CTE, not an insert and a select.
But you can simply move the insert into a second cte, and then have a single SELECT at the end that returns the data that was found
with found_group as (
SELECT *
from "group"
where code = 'tohubo'
LIMIT 1
), inserted as (
insert into group_users__user_groups (group_users, user_groups)
select 1, id
from found_group
)
Select *
from found_group;