Postgres: upsert a row and update a primary key column - sql

Suppose I have two tables in my Postgres database:
create table transactions
(
id bigint primary key,
doc_id bigint not null,
-- lots of other columns...
amount numeric not null
);
-- same columns
create temporary table updated_transactions
(
id bigint primary key,
doc_id bigint not null,
-- lots of other columns...
amount numeric not null
);
Both tables have just a primary key, and no unique indexes.
I need to upsert rows from updated_transactions into transactions using the following rules:
id column values in transactions and updated_transactions don't match
other columns like doc_id, etc (except of the amount) should match
when a matching row is found, update both amount and id columns
when a matching row is not found, insert it
id values in updated_transactions are taken from a sequence.
A business object just populates updated_transactions and then merges the
new or updated rows from it into transactions using an upsert query.
So my old unchanged transactions keep their ids intact, and the updated ones
are assigned new ids.
In MSSQL and Oracle, it would be a merge statement similar to this:
merge into transactions t
using updated_transactions ut on t.doc_id = ut.doc_id, ...
when matched then
update set t.id = ut.id, t.amount = ut.amount
when not matched then
insert (t.id, t.doc_id, ..., t.amount)
values (ut.id, ut.doc_id, ..., ut.amount);
In PostgreSQL, I suppose it should be something like this:
insert into transactions(id, doc_id, ..., amount)
select coalesce(t.id, ut.id), ut.doc_id, ... ut.amount
from updated_transactions ut
left join transactions t on t.doc_id = ut.doc_id, ....
on conflict
on constraint transactions_pkey
do update
set amount = excluded.amount, id = excluded.id
The problem is with the do update clause: excluded.id is an old value
from transactions table, while I need a new value from updated_transactions.
ut.id value is inaccessible for the do update clause, and the only thing I can
use is the excluded row. But the excluded row has only coalesce(t.id, ut.id)
expression which returns old id values for the existing rows.
Is it possible to update both id and amount columns using the upsert query?

Create unique index on those columns you use as key and pass its name in your upsert expression, so that it uses it instead of pkey.
Then it will insert row if no matches were found, using ID from updated_transactions. If it finds match, then you can use excluded.id to get ID from updated_transactions.
I think that left join transactions is redundant.
So it would look kinda like this:
insert into transactions(id, doc_id, ..., amount)
select ut.id, ut.doc_id, ... ut.amount
from updated_transactions ut
on conflict
on constraint transactions_multi_column_unique_index
do update
set amount = excluded.amount, id = excluded.id

Looks like the task can be accomplished using writable CTEs instead of the plain upsert.
First, I'll post the easier version of the query that answers the original question as it was asked. This solution assumes that doc_id, unit_id columns address a candidate key, but doesn't require a unique index on these columns.
Test data:
create temp table transactions
(
id bigint primary key,
doc_id bigint,
unit_id bigint,
amount numeric
);
create temp table updated_transactions
(
id bigint primary key,
doc_id bigint,
unit_id bigint,
amount numeric
);
insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 2, 15), (3, 1, 3, 10);
insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 2, 15), (8, 1, 4, 20);
The query to merge updated_transactions into transactions:
with new_values as
(
select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount
from updated_transactions ut
left join transactions t
on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id
),
updated as
(
update transactions tr
set id = nv.new_id, amount = nv.amount
from new_values nv
where id = nv.old_id
returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);
The results:
select * from transactions
-- id | doc_id | unit_id | amount
------+--------+---------+-------
-- 3 | 1 | 3 | 10 -- not changed
-- 6 | 1 | 1 | 11 -- updated
-- 7 | 1 | 2 | 15 -- updated
-- 8 | 1 | 4 | 20 -- inserted
In my real application doc_id, unit_id aren't always unique, so they don't represent a candidate key. To match the rows I take into account the row number, calculated for the rows sorted by their ids. So here's my second solution.
Test data:
-- the tables are the same as above
insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 1, 15), (3, 1, 3, 10);
insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 1, 15), (8, 1, 4, 20);
The merge query:
with trans as
(
select id, doc_id, unit_id, amount,
row_number() over(partition by doc_id, unit_id order by id) row_num
from transactions
),
updated_trans as
(
select id, doc_id, unit_id, amount,
row_number() over(partition by doc_id, unit_id order by id) row_num
from updated_transactions
),
new_values as
(
select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount
from updated_trans ut
left join trans t
on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id and t.row_num = ut.row_num
),
updated as
(
update transactions tr
set id = nv.new_id, amount = nv.amount
from new_values nv
where id = nv.old_id
returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);
The results:
select * from transactions;
-- id | doc_id | unit_id | amount
------+--------+---------+-------
-- 3 | 1 | 3 | 10 -- not changed
-- 6 | 1 | 1 | 11 -- updated
-- 7 | 1 | 1 | 15 -- updated
-- 8 | 1 | 4 | 20 -- inserted
References:
Insert on duplicate update in PostgreSQL
Upserting via Writeable CTE
Waiting for 9.1 — Writable CTE
Why is UPSERT so complicated?

Related

Matching array of rows with multiple conditions

I have table transaction and transactionaction with the following columns. transactionid in transactionaction is a foreign key to transaction.
transactionid
name
1
Trans 1
2
Trans 2
3
Trans 3
4
Trans 4
actionid
transactionid
actiontype
value
1
1
1
null
2
1
2
null
3
1
3
null
4
2
1
1
5
2
2
null
6
2
3
null
I need to find every transaction with contains all actions passed by user. It's important to note that some actions can be filtered only based on actionType (actions 2, 3) and some also on value (action 1).
So if user wants to find transaction with actions 2,3 he should get transaction 1 and 2. For this case, with help of this answer https://stackoverflow.com/a/41140629/12035106, I created this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
)
However, action 1 need to take value into consideration because action 1 with value == null is different than action with value != null. In this case, I cannot really use the previous query. I came up with this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
) AND transactionid in (
SELECT transactionid
FROM public.transactionaction
WHERE actiontype = 1 AND value is not null
)
This one work, as a result, I will get only transaction 2, but I feel I overcomplicated it because this query is looping through the same table multiple times. I created an index on actiontype, so the query plan looks better, but maybe there is an easier way to achieve the same result?
I consider that you have an initial design problem, even so I will try to help you with the query.
You can work with a subquery and know if the transaction has an actionType 1 different from null (by means of an aggregation function and an "over" to avoid the "group by"), with this you know you can filter the actionType 2 and 3
I try to simulate your scenario
drop table if exists transaction;
create temp table transaction(
transactionId bigint,
name varchar
) on commit drop;
insert into transaction(transactionId, name) values(1, 'Trans 1');
insert into transaction(transactionId, name) values(2, 'Trans 2');
drop table if exists transactionAction;
create temp table transactionAction(
actionId bigint,
transactionId bigint,
actionType bigint,
value bigint
) on commit drop;
insert into transactionAction(actionId, transactionId, actionType, value) values(1, 1, 1, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(2, 1, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(3, 1, 3, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(4, 2, 1, 1);
insert into transactionAction(actionId, transactionId, actionType, value) values(5, 2, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(6, 2, 3, null);
the query would be
select
distinct
t.*
from
(
select
ta.actionid,
ta.transactionId,
ta.actionType,
ta.value,
/*sum only if action type 1 is different from null*/
sum(case when ta.actionType = 1 and not ta.value is null then 1 else 0 end) over(partition by ta.transactionId) haveActionOne
from
transactionAction ta
) a
inner join transaction t
on a.transactionId = t.transactionId
where
/*indicates if the transaction in its actions has type 1 different from null in its value*/
a.haveActionOne = 1
and a.actionType in (2,3);
Regards,
as an option it would be elegantly to move the condition for action (1) in the Where clause thus this approach eliminates any subqueries.
Select Transaction.transactionid, Max(Transaction.name)
From Transaction Inner Join Transactionaction On (Transaction.transactionid=Transactionaction.transactionid)
Where (actiontype=1 And value Is Not Null) Or actiontype<>1
Group by Transaction.transactionid
Having array_agg(actiontype Order by actiontype) = array[1,2,3]

Adding a LEFT JOIN on a INSERT INTO....RETURNING

My query Inserts a value and returns the new row inserted
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING comment_id, date_posted, e_id, created_by, parent_id, body, num_likes, thread_id
I want to join the created_by with the user_id from my user's table.
SELECT * from users WHERE user_id = created_by
Is it possible to join that new returning row with another table row?
Consider using a WITH structure to pass the data from the insert to a query that can then be joined.
Example:
-- Setup some initial tables
create table colors (
id SERIAL primary key,
color VARCHAR UNIQUE
);
create table animals (
id SERIAL primary key,
a_id INTEGER references colors(id),
animal VARCHAR UNIQUE
);
-- provide some initial data in colors
insert into colors (color) values ('red'), ('green'), ('blue');
-- Store returned data in inserted_animal for use in next query
with inserted_animal as (
-- Insert a new record into animals
insert into animals (a_id, animal) values (3, 'fish') returning *
) select * from inserted_animal
left join colors on inserted_animal.a_id = colors.id;
-- Output
-- id | a_id | animal | id | color
-- 1 | 3 | fish | 3 | blue
Explanation:
A WITH query allows a record returned from an initial query, including data returned from a RETURNING clause, which is stored in a temporary table that can be accessed in the expression that follows it to continue work on it, including using a JOIN expression.
You were right, I misunderstood
This should do it:
DECLARE mycreated_by event_comments.created_by%TYPE;
INSERT INTO
event_comments(date_posted, e_id, created_by, parent_id, body, num_likes, thread_id)
VALUES(1575770277, 1, '9e028aaa-d265-4e27-9528-30858ed8c13d', 9, 'December 7th', 0, 'zRfs2I')
RETURNING created_by into mycreated_by
SELECT * from users WHERE user_id = mycreated_by

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

Inserting into multiple tables and selecting the first result

Let's say I have two tables that implement a very simple invoice system (note: the schema can't be changed):
create table invoices(
id serial primary key,
parent_invoice_id int null references invoices(id),
name text not null
);
create table line_items(
id serial primary key,
invoice_id int not null references invoices(id),
amount int not null
);
The user has the ability to "clone" an invoice and have it refer to the original "parent" invoice. In the system, the invoice is required directly after the clone (however the line_items are not required). Therefore, after cloning the invoice, the new invoice must be returned. Here's the SQL I'm using to clone an invoice:
with new_invoice_row as (
insert into invoices (parent_invoice_id, name)
values (12345/*invoice_to_clone_id*/, 'Hello World')
returning *
),
new_line_item_rows as (
insert into line_items (invoice_id, amount)
select
new_invoice_row.id, line_item.amount
from line_items
cross join new_invoice_row
where
line_item.invoice_id = 12345/*invoice_to_clone_id*/
returning id
)
select * from new_invoice_row;
Questions:
Is the cross join going to perform well? I was thinking of being able to just remove the cross join to reduce having to do the join, but it wouldn't run (error: missing FROM-clause entry for table "new_invoice_row"):
...
insert into line_items (invoice_id, amount)
select
new_invoice_row.id, line_item.amount
from line_items
where
line_item.invoice_id = 12345
returning id
...
Is there anyway that the returning id part of the new_line_item_rows statement can be removed? The new line items aren't needed, so I'd like to avoid the extra overhead if it can improve performance.
Should I stop using a query and move all of this into a function? The system was originally using a MS SQL database, so I'm more familiar with using declare and having multiple statements use the variable.
The first query can return only id and parent_invoice_id.
Use the second value in order to avoid re-writing the argument (as a protection against typos).
Cross join is necessary and correct.
You can skip returning * in the second query.
A function is not necessary, although it may be convenient to use.
with new_invoice_row as (
insert into invoices (parent_invoice_id, name)
values (12345, 'Hello World')
returning id, parent_invoice_id
),
new_line_item_rows as (
insert into line_items (invoice_id, amount)
select
new_invoice_row.id, line_items.amount
from line_items
cross join new_invoice_row
where
line_items.invoice_id = new_invoice_row.parent_invoice_id
)
select * from new_invoice_row;
create table invoices(
id serial primary key,
parent_invoice_id int null references invoices(id),
name text not null
);
INSERT INTO invoices(parent_invoice_id, name) VALUES
( NULL, 'One')
,( 1, 'two')
,( NULL, 'three')
;
create table line_items(
id serial primary key,
invoice_id int not null references invoices(id),
amount int not null
);
INSERT INTO line_items (invoice_id, amount) VALUES
(1, 10)
,(1, 11)
,(2, 21)
,(2, 22)
,(3, 33)
;
-- for demonstration purposes: the clone+insert as a prepared statement
-- (this is *not* necessary, only convenient)
PREPARE clone_the_invoice (INTEGER, text, INTEGER) AS
WITH new_invoice_row as (
INSERT into invoices (parent_invoice_id, name)
VALUES ( $1 /*invoice_to_clone_id*/, $2 /*name */ )
RETURNING id)
, new_line_item_rows as (
INSERT into line_items (invoice_id, amount)
SELECT new_invoice_row.id, $3 /* amount */
FROM new_invoice_row
RETURNING id
)
SELECT * FROM new_line_item_rows
;
-- call the prepared statement.
-- This will clone invoice#2,
-- and insert one row in items, referring to the cloned row
-- it returns the new item's id, which is sufficient to
-- find the invoice.id too, when needed.
-- -----------------------------------------------------------------
EXECUTE clone_the_invoice (2, 'four', 123);
-- Chek the result
SELECT
iv.id
, iv.parent_invoice_id
, iv.name
, li.id AS lineid
, li.amount
FROM invoices iv
JOIN line_items li ON li.invoice_id = iv.id
;
Result:
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 5
PREPARE
id
----
6
(1 row)
id | parent_invoice_id | name | lineid | amount
----+-------------------+-------+--------+--------
1 | | One | 1 | 10
1 | | One | 2 | 11
2 | 1 | two | 3 | 21
2 | 1 | two | 4 | 22
3 | | three | 5 | 33
4 | 2 | four | 6 | 123
(6 rows)
And for non-trivial cases, the FKs will need a supporting index (this is not added automatically, so you should do this manually)
CREATE INDEX ON invoices (parent_invoice_id);
CREATE INDEX ON line_items (invoice_id);
Update: if you insist on returning the new invoice, here you go:
PREPARE clone_the_invoice2 (INTEGER, text, integer) AS
WITH new_invoice_row as (
INSERT into invoices (parent_invoice_id, name)
VALUES ( $1 /*invoice_to_clone_id*/, $2 )
RETURNING *
)
, new_line_item_rows as (
INSERT into line_items (invoice_id, amount)
SELECT new_invoice_row.id, $3
FROM new_invoice_row
RETURNING *
)
SELECT iv.*
FROM new_invoice_row iv
JOIN new_line_item_rows new ON new.invoice_id = iv.id
;
UPDATE 2 (it appears the OP wants the detail lines to be cloned, too:
-- Clone an invoice
-- INCLUDING all associated line_items
-- --------------------------------------
PREPARE clone_the_invoice3 (INTEGER, text) AS
WITH new_invoice_row as (
INSERT into invoices (parent_invoice_id, name)
VALUES ( $1 /*invoice_to_clone_id*/
, $2 /* name */
)
RETURNING *
)
, new_line_item_rows as (
INSERT into line_items (invoice_id, amount)
SELECT cl.id -- the cloned invoice
, it.amount
FROM line_items it
CROSS JOIN new_invoice_row cl
WHERE it.invoice_id = $1 -- The original invoice
RETURNING *
)
SELECT iv.*
FROM new_invoice_row iv
JOIN new_line_item_rows new ON new.invoice_id = iv.id
;
EXECUTE clone_the_invoice3 (2, 'four');

query to count number of unique relations

I have 3 tables:
t_user (id, name)
t_user_deal (id, user_id, deal_id)
t_deal (id, title)
multiple user can be linked to the same deal. (I'm using oracle but it should be similar, I can adapt it)
How can I get all the users (name) with the number of unique user he made a deal with.
let's explain with some data:
t_user:
id, name
1, joe
2, mike
3, John
t_deal:
id, title
1, deal number 1
2, deal number 2
t_user_deal:
id, user_id, deal_id
1, 1, 1
2, 2, 1
3, 1, 2
4, 3, 2
the result I expect:
user_name, number of unique user he made a deal with
Joe, 2
Mike, 1
John, 1
I've try this but I didn't get the expected result:
SELECT tu.name,
count(tu.id) AS nbRelations
FROM t_user tu
INNER JOIN t_user_deal tud ON tu.id = tud.user_id
INNER JOIN t_deal td ON tud.deal_id = td.id
WHERE
(
td.id IN
(
SELECT DISTINCT td.id
FROM t_user_deal tud2
INNER JOIN t_deal td2 ON tud2.deal_id = td2.id
WHERE tud.id <> tud2.user_id
)
)
GROUP BY tu.id
ORDER BY nbRelations DESC
thanks for your help
This should get you the result
SELECT id1, count(id2),name
FROM (
SELECT distinct tud1.user_id id1 , tud2.user_id id2
FROM t_user_deal tud1, t_user_deal tud2
WHERE tud1.deal_id = tud2.deal_id
and tud1.user_id <> tud2.user_id) as tab, t_user tu
WHERE tu.id = id1
GROUP BY id1,name
Something like
select name, NVL (i.ud, 0) ud from t_user join (
SELECT user_id, count(*) ud from t_user_deal group by user_id) i on on t_user.id = i.user_id
where i.ud > 0
Unless I'm missing somethig here. It actually sounds like your question references having a second user in the t_user_deal table. The model you've described here doesn't include that.
PostgreSQL example:
create table t_user (id int, name varchar(255)) ;
create table t_deal (id int, title varchar(255)) ;
create table t_user_deal (id int, user_id int, deal_id int) ;
insert into t_user values (1, 'joe'), (2, 'mike'), (3, 'john') ;
insert into t_deal values (1, 'deal 1'), (2, 'deal 2') ;
insert into t_user_deal values (1, 1, 1), (2, 2, 1), (3, 1, 2), (4, 3, 2) ;
And the query.....
SELECT
name, COUNT(DISTINCT deal_id)
FROM
t_user INNER JOIN t_user_deal ON (t_user.id = t_user_deal.user_id)
GROUP BY
user_id, name ;
The DISTINCT might not be necessary (in the COUNT(), that is). Depends on how clean your data is (e.g., no duplicate rows!)
Here's the result in PostgreSQL:
name | count
------+-------
joe | 2
mike | 1
john | 1
(3 rows)