Matching array of rows with multiple conditions - sql

I have table transaction and transactionaction with the following columns. transactionid in transactionaction is a foreign key to transaction.
transactionid
name
1
Trans 1
2
Trans 2
3
Trans 3
4
Trans 4
actionid
transactionid
actiontype
value
1
1
1
null
2
1
2
null
3
1
3
null
4
2
1
1
5
2
2
null
6
2
3
null
I need to find every transaction with contains all actions passed by user. It's important to note that some actions can be filtered only based on actionType (actions 2, 3) and some also on value (action 1).
So if user wants to find transaction with actions 2,3 he should get transaction 1 and 2. For this case, with help of this answer https://stackoverflow.com/a/41140629/12035106, I created this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
)
However, action 1 need to take value into consideration because action 1 with value == null is different than action with value != null. In this case, I cannot really use the previous query. I came up with this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
) AND transactionid in (
SELECT transactionid
FROM public.transactionaction
WHERE actiontype = 1 AND value is not null
)
This one work, as a result, I will get only transaction 2, but I feel I overcomplicated it because this query is looping through the same table multiple times. I created an index on actiontype, so the query plan looks better, but maybe there is an easier way to achieve the same result?

I consider that you have an initial design problem, even so I will try to help you with the query.
You can work with a subquery and know if the transaction has an actionType 1 different from null (by means of an aggregation function and an "over" to avoid the "group by"), with this you know you can filter the actionType 2 and 3
I try to simulate your scenario
drop table if exists transaction;
create temp table transaction(
transactionId bigint,
name varchar
) on commit drop;
insert into transaction(transactionId, name) values(1, 'Trans 1');
insert into transaction(transactionId, name) values(2, 'Trans 2');
drop table if exists transactionAction;
create temp table transactionAction(
actionId bigint,
transactionId bigint,
actionType bigint,
value bigint
) on commit drop;
insert into transactionAction(actionId, transactionId, actionType, value) values(1, 1, 1, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(2, 1, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(3, 1, 3, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(4, 2, 1, 1);
insert into transactionAction(actionId, transactionId, actionType, value) values(5, 2, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(6, 2, 3, null);
the query would be
select
distinct
t.*
from
(
select
ta.actionid,
ta.transactionId,
ta.actionType,
ta.value,
/*sum only if action type 1 is different from null*/
sum(case when ta.actionType = 1 and not ta.value is null then 1 else 0 end) over(partition by ta.transactionId) haveActionOne
from
transactionAction ta
) a
inner join transaction t
on a.transactionId = t.transactionId
where
/*indicates if the transaction in its actions has type 1 different from null in its value*/
a.haveActionOne = 1
and a.actionType in (2,3);
Regards,

as an option it would be elegantly to move the condition for action (1) in the Where clause thus this approach eliminates any subqueries.
Select Transaction.transactionid, Max(Transaction.name)
From Transaction Inner Join Transactionaction On (Transaction.transactionid=Transactionaction.transactionid)
Where (actiontype=1 And value Is Not Null) Or actiontype<>1
Group by Transaction.transactionid
Having array_agg(actiontype Order by actiontype) = array[1,2,3]

Related

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

Postgres: upsert a row and update a primary key column

Suppose I have two tables in my Postgres database:
create table transactions
(
id bigint primary key,
doc_id bigint not null,
-- lots of other columns...
amount numeric not null
);
-- same columns
create temporary table updated_transactions
(
id bigint primary key,
doc_id bigint not null,
-- lots of other columns...
amount numeric not null
);
Both tables have just a primary key, and no unique indexes.
I need to upsert rows from updated_transactions into transactions using the following rules:
id column values in transactions and updated_transactions don't match
other columns like doc_id, etc (except of the amount) should match
when a matching row is found, update both amount and id columns
when a matching row is not found, insert it
id values in updated_transactions are taken from a sequence.
A business object just populates updated_transactions and then merges the
new or updated rows from it into transactions using an upsert query.
So my old unchanged transactions keep their ids intact, and the updated ones
are assigned new ids.
In MSSQL and Oracle, it would be a merge statement similar to this:
merge into transactions t
using updated_transactions ut on t.doc_id = ut.doc_id, ...
when matched then
update set t.id = ut.id, t.amount = ut.amount
when not matched then
insert (t.id, t.doc_id, ..., t.amount)
values (ut.id, ut.doc_id, ..., ut.amount);
In PostgreSQL, I suppose it should be something like this:
insert into transactions(id, doc_id, ..., amount)
select coalesce(t.id, ut.id), ut.doc_id, ... ut.amount
from updated_transactions ut
left join transactions t on t.doc_id = ut.doc_id, ....
on conflict
on constraint transactions_pkey
do update
set amount = excluded.amount, id = excluded.id
The problem is with the do update clause: excluded.id is an old value
from transactions table, while I need a new value from updated_transactions.
ut.id value is inaccessible for the do update clause, and the only thing I can
use is the excluded row. But the excluded row has only coalesce(t.id, ut.id)
expression which returns old id values for the existing rows.
Is it possible to update both id and amount columns using the upsert query?
Create unique index on those columns you use as key and pass its name in your upsert expression, so that it uses it instead of pkey.
Then it will insert row if no matches were found, using ID from updated_transactions. If it finds match, then you can use excluded.id to get ID from updated_transactions.
I think that left join transactions is redundant.
So it would look kinda like this:
insert into transactions(id, doc_id, ..., amount)
select ut.id, ut.doc_id, ... ut.amount
from updated_transactions ut
on conflict
on constraint transactions_multi_column_unique_index
do update
set amount = excluded.amount, id = excluded.id
Looks like the task can be accomplished using writable CTEs instead of the plain upsert.
First, I'll post the easier version of the query that answers the original question as it was asked. This solution assumes that doc_id, unit_id columns address a candidate key, but doesn't require a unique index on these columns.
Test data:
create temp table transactions
(
id bigint primary key,
doc_id bigint,
unit_id bigint,
amount numeric
);
create temp table updated_transactions
(
id bigint primary key,
doc_id bigint,
unit_id bigint,
amount numeric
);
insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 2, 15), (3, 1, 3, 10);
insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 2, 15), (8, 1, 4, 20);
The query to merge updated_transactions into transactions:
with new_values as
(
select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount
from updated_transactions ut
left join transactions t
on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id
),
updated as
(
update transactions tr
set id = nv.new_id, amount = nv.amount
from new_values nv
where id = nv.old_id
returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);
The results:
select * from transactions
-- id | doc_id | unit_id | amount
------+--------+---------+-------
-- 3 | 1 | 3 | 10 -- not changed
-- 6 | 1 | 1 | 11 -- updated
-- 7 | 1 | 2 | 15 -- updated
-- 8 | 1 | 4 | 20 -- inserted
In my real application doc_id, unit_id aren't always unique, so they don't represent a candidate key. To match the rows I take into account the row number, calculated for the rows sorted by their ids. So here's my second solution.
Test data:
-- the tables are the same as above
insert into transactions(id, doc_id, unit_id, amount)
values (1, 1, 1, 10), (2, 1, 1, 15), (3, 1, 3, 10);
insert into updated_transactions(id, doc_id, unit_id, amount)
values (6, 1, 1, 11), (7, 1, 1, 15), (8, 1, 4, 20);
The merge query:
with trans as
(
select id, doc_id, unit_id, amount,
row_number() over(partition by doc_id, unit_id order by id) row_num
from transactions
),
updated_trans as
(
select id, doc_id, unit_id, amount,
row_number() over(partition by doc_id, unit_id order by id) row_num
from updated_transactions
),
new_values as
(
select ut.id new_id, t.id old_id, ut.doc_id, ut.unit_id, ut.amount
from updated_trans ut
left join trans t
on t.doc_id = ut.doc_id and t.unit_id = ut.unit_id and t.row_num = ut.row_num
),
updated as
(
update transactions tr
set id = nv.new_id, amount = nv.amount
from new_values nv
where id = nv.old_id
returning tr.*
)
insert into transactions(id, doc_id, unit_id, amount)
select ut.new_id, ut.doc_id, ut.unit_id, ut.amount
from new_values ut
where ut.new_id not in (select id from updated);
The results:
select * from transactions;
-- id | doc_id | unit_id | amount
------+--------+---------+-------
-- 3 | 1 | 3 | 10 -- not changed
-- 6 | 1 | 1 | 11 -- updated
-- 7 | 1 | 1 | 15 -- updated
-- 8 | 1 | 4 | 20 -- inserted
References:
Insert on duplicate update in PostgreSQL
Upserting via Writeable CTE
Waiting for 9.1 — Writable CTE
Why is UPSERT so complicated?

Operation on each row from select query

How execute additional query (UPDATE) on each row from SELECT?
I have to get amount from each row from select and send it to user's balance table.
Example:
status 0 - open
status 1 - processed
status 2 - closed
My select statement:
select id, user_id, sell_amount, sell_currency_id
from (select id, user_id, sell_amount, sell_currency_id,
sum(sell_amount)
over (order by buy_amount/sell_amount ASC, date_add ASC) as cumsell
from market t
where (status = 0 or status = 1) and type = 0
) t
where 0 <= cumsell and 7 > cumsell - sell_amount;
Select result from market table
id;user_id;amount;status
4;1;1.00000000;0
6;2;2.60000000;0
5;3;2.00000000;0
7;4;4.00000000;0
We get 7 amount and send it to user balance table.
id;user_id;amount;status
4;1;0.00000000;2 -- took 1, sum 1, status changed to 2
6;2;0.00000000;2 -- took 2.6, sum=3.6, status changed to 2
5;3;0.00000000;2 -- took 2, sum 5.6, status changed to 2
7;4;2.60000000;1 -- took 1.4, sum 7.0, status changed to 1 (because there left 2.6 to close)
User's balance table
user_id;balance
5;7 -- added 7 from previous operation
Postgres version 9.3
The general principle is to use UPDATE ... FROM over a subquery. Your example is too hard to turn into useful CREATE TABLE and SELECT statements, so I've made up a quick dummy dataset:
CREATE TABLE balances (user_id integer, balance numeric);
INSERT INTO balances (user_id, balance) VALUES (1,0), (2, 2.1), (3, 99);
CREATE TABLE transactions (user_id integer, amount numeric, applied boolean default 'f');
INSERT INTO transactions (user_id, amount) VALUES (1, 22), (1, 10), (2, -10), (4, 1000000);
If you wanted to apply the transactions to the balances you would do something like:
BEGIN;
LOCK TABLE balances IN EXCLUSIVE MODE;
LOCK TABLE transactions IN EXCLUSIVE MODE;
UPDATE balances SET balance = balance + t.amount
FROM (
SELECT t2.user_id, sum(t2.amount) AS amount
FROM transactions t2
GROUP BY t2.user_id
) t
WHERE balances.user_id = t.user_id;
UPDATE transactions
SET applied = true
FROM balances b
WHERE transactions.user_id = b.user_id;
The LOCK statements are important for correctness in the presence of concurrent inserts/updates.
The second UPDATE marks the transactions as applied; you might not need something like that in your design.

SQL Server 2008, how to check if multi records exist in the DB?

I have 3 tables:
recipe:
id, name
ingredient:
id, name
recipeingredient:
id, recipeId, ingredientId, quantity
Every time, a customer creates a new recipe, I need to check the recipeingredient table to verify if this recipe exists or not. If ingredientId and quantity are exactly the same, I will tell the customer the recipe already exists. Since I need to check multiple rows, need help to write this query.
Knowing your ingredients and quantities, you can do something like this:
select recipeId as ExistingRecipeID
from recipeingredient
where (ingredientId = 1 and quantity = 1)
or (ingredientId = 8 and quantity = 1)
or (ingredientId = 13 and quantity = 1)
group by recipeId
having count(*) = 3 --must match # of ingeredients in WHERE clause
I originally thought that the following query would find pairs of recipes that have exactly the same ingredients:
select ri1.recipeId, ri2.recipeId
from RecipeIngredient ri1 full outer join
RecipeIngredient ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having count(ri1.id) = count(ri2.id) and -- same number of ingredients
count(ri1.id) = count(*) and -- all r1 ingredients are present
count(*) = count(ri2.id) -- all r2 ingredents are present
However, this query doesn't count things correctly, because the mismatches don't have the right pairs of ids. Alas.
The following does do the correct comparison. It counts the ingredients in each recipe before the join, so this value can just be compared on all matching rows.
select ri1.recipeId, ri2.recipeId
from (select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri1 full outer join
(select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having max(ri1.numingredients) = max(ri2.numingredients) and
max(ri1.numingredients) = count(*)
The having clause guarantees that each recipe that the same number of ingredients, and that the number of matching ingredients is the total. This time, I've tested it on the following data:
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
insert into #recipeingredient select 2, 3, 10
insert into #recipeingredient select 3, 1, 1
insert into #recipeingredient select 4, 1, 1
insert into #recipeingredient select 4, 3, 10
insert into #recipeingredient select 5, 1, 1
insert into #recipeingredient select 5, 2, 10
If you have a new recipe, you can modify this query to just look for the recipe in one of the tables (say ri1) using an additional condition on the on clause.
If you place the ingredients in a temporary table, you can substitute one of these tables, say ri1, with the new table.
You might try something like this to find if you have a duplicate:
-- Setup test data
declare #recipeingredient table (
id int not null primary key identity
, recipeId int not null
, ingredientId int not null
, quantity int not null
)
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
-- Actual Query
if exists (
select *
from #recipeingredient old
full outer join #recipeingredient new
on old.recipeId != new.recipeId -- Different recipes
and old.ingredientId = new.ingredientId -- but same ingredients
and old.quantity = new.quantity -- and same quantities
where old.id is null -- Match not found
or new.id is null -- Match not found
)
begin
select cast(0 as bit) as IsDuplicateRecipe
end
else begin
select cast(1 as bit) as IsDuplicateRecipe
end
Since this is really only searching for a duplicate, you might want to substitute a temp table or pass a table variable for the "new" table. This way you wouldn't have to insert the new records before doing your search. You could also insert into the base tables, wrap the whole thing in a transaction and rollback based upon the results.

Find rows with same ID and have a particular set of names

EDIT:
I have a table with 3 rows like so.
ID NAME REV
1 A 0
1 B 0
1 C 0
2 A 1
2 B 0
2 C 0
3 A 1
3 B 1
I want to find the ID wich has a particular set of Names and the REV is same
example:
Edit2: GBN's solution would have worked perfectly, but since i do not have the access to create new tables. The added constraint is that no new tables can be created.
if input = A,B then output is 3
if input = A ,B,C then output is 1 and not 1,2 since the rev level differs in 2.
The simplest way is to compare a COUNT per ID with the number of elements in your list:
SELECT
ID
FROM
MyTable
WHERE
NAME IN ('A', 'B', 'C')
GROUP BY
ID
HAVING
COUNT(*) = 3;
Note: ORDER BY isn't needed and goes after the HAVING if needed
Edit, with question update. In MySQL, it's easier to use a separate table for search terms
DROP TABLE IF EXISTS gbn;
CREATE TABLE gbn (ID INT, `name` VARCHAR(100), REV INT);
INSERT gbn VALUES (1, 'A', 0);
INSERT gbn VALUES (1, 'B', 0);
INSERT gbn VALUES (1, 'C', 0);
INSERT gbn VALUES (2, 'A', 1);
INSERT gbn VALUES (2, 'B', 0);
INSERT gbn VALUES (2, 'C', 0);
INSERT gbn VALUES (3, 'A', 0);
INSERT gbn VALUES (3, 'B', 0);
DROP TABLE IF EXISTS gbn1;
CREATE TABLE gbn1 ( `name` VARCHAR(100));
INSERT gbn1 VALUES ('A');
INSERT gbn1 VALUES ('B');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
INSERT gbn1 VALUES ('C');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
Edit 2, without extra table, use a derived (inline) table:
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
(SELECT 'A' AS `name`
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
) gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = 3 -- matches number of elements in gbn1 derived table
AND MIN(gbn.REV) = MAX(gbn.REV);
Similar to gbn, but allowing for the possibility of duplicate ID/Name combinations:
SELECT ID
FROM MyTable
WHERE NAME IN ('A', 'B', 'C')
GROUP BY ID
HAVING COUNT(DISTINCT NAME) = 3;
OKAY!... I solved my problem ! I modified GBN's logic to do it without a search table using the IN clause
1 flaw with doing MAX(rev) = MIN(REV) is: if i have a data like so .
ID NAME REV
1 A 0
1 B 1
1 A 1
then when I use a query like
Select ID from TABLE
where NAME in {A,B}
groupby ID
having count(*) = 2
and MIN(REV) = MAX(REV)
it will not show me the ID 1 as the min and max are different and the count is 3.
So i simply add another column to the groupby
so the final query is
Select ID from TABLE
where NAME in {A,B}
groupby ID,REV
having count(*) = 2
and MIN(REV) = MAX(REV)
Thanks,to all that helped. !