Operation on each row from select query - sql

How execute additional query (UPDATE) on each row from SELECT?
I have to get amount from each row from select and send it to user's balance table.
Example:
status 0 - open
status 1 - processed
status 2 - closed
My select statement:
select id, user_id, sell_amount, sell_currency_id
from (select id, user_id, sell_amount, sell_currency_id,
sum(sell_amount)
over (order by buy_amount/sell_amount ASC, date_add ASC) as cumsell
from market t
where (status = 0 or status = 1) and type = 0
) t
where 0 <= cumsell and 7 > cumsell - sell_amount;
Select result from market table
id;user_id;amount;status
4;1;1.00000000;0
6;2;2.60000000;0
5;3;2.00000000;0
7;4;4.00000000;0
We get 7 amount and send it to user balance table.
id;user_id;amount;status
4;1;0.00000000;2 -- took 1, sum 1, status changed to 2
6;2;0.00000000;2 -- took 2.6, sum=3.6, status changed to 2
5;3;0.00000000;2 -- took 2, sum 5.6, status changed to 2
7;4;2.60000000;1 -- took 1.4, sum 7.0, status changed to 1 (because there left 2.6 to close)
User's balance table
user_id;balance
5;7 -- added 7 from previous operation
Postgres version 9.3

The general principle is to use UPDATE ... FROM over a subquery. Your example is too hard to turn into useful CREATE TABLE and SELECT statements, so I've made up a quick dummy dataset:
CREATE TABLE balances (user_id integer, balance numeric);
INSERT INTO balances (user_id, balance) VALUES (1,0), (2, 2.1), (3, 99);
CREATE TABLE transactions (user_id integer, amount numeric, applied boolean default 'f');
INSERT INTO transactions (user_id, amount) VALUES (1, 22), (1, 10), (2, -10), (4, 1000000);
If you wanted to apply the transactions to the balances you would do something like:
BEGIN;
LOCK TABLE balances IN EXCLUSIVE MODE;
LOCK TABLE transactions IN EXCLUSIVE MODE;
UPDATE balances SET balance = balance + t.amount
FROM (
SELECT t2.user_id, sum(t2.amount) AS amount
FROM transactions t2
GROUP BY t2.user_id
) t
WHERE balances.user_id = t.user_id;
UPDATE transactions
SET applied = true
FROM balances b
WHERE transactions.user_id = b.user_id;
The LOCK statements are important for correctness in the presence of concurrent inserts/updates.
The second UPDATE marks the transactions as applied; you might not need something like that in your design.

Related

Matching array of rows with multiple conditions

I have table transaction and transactionaction with the following columns. transactionid in transactionaction is a foreign key to transaction.
transactionid
name
1
Trans 1
2
Trans 2
3
Trans 3
4
Trans 4
actionid
transactionid
actiontype
value
1
1
1
null
2
1
2
null
3
1
3
null
4
2
1
1
5
2
2
null
6
2
3
null
I need to find every transaction with contains all actions passed by user. It's important to note that some actions can be filtered only based on actionType (actions 2, 3) and some also on value (action 1).
So if user wants to find transaction with actions 2,3 he should get transaction 1 and 2. For this case, with help of this answer https://stackoverflow.com/a/41140629/12035106, I created this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
)
However, action 1 need to take value into consideration because action 1 with value == null is different than action with value != null. In this case, I cannot really use the previous query. I came up with this query.
SELECT * from transaction
WHERE transactionid in (
SELECT transactionid
FROM public.transactionaction
group by transactionid
having array_agg(actiontype) #> array[2,3]
) AND transactionid in (
SELECT transactionid
FROM public.transactionaction
WHERE actiontype = 1 AND value is not null
)
This one work, as a result, I will get only transaction 2, but I feel I overcomplicated it because this query is looping through the same table multiple times. I created an index on actiontype, so the query plan looks better, but maybe there is an easier way to achieve the same result?
I consider that you have an initial design problem, even so I will try to help you with the query.
You can work with a subquery and know if the transaction has an actionType 1 different from null (by means of an aggregation function and an "over" to avoid the "group by"), with this you know you can filter the actionType 2 and 3
I try to simulate your scenario
drop table if exists transaction;
create temp table transaction(
transactionId bigint,
name varchar
) on commit drop;
insert into transaction(transactionId, name) values(1, 'Trans 1');
insert into transaction(transactionId, name) values(2, 'Trans 2');
drop table if exists transactionAction;
create temp table transactionAction(
actionId bigint,
transactionId bigint,
actionType bigint,
value bigint
) on commit drop;
insert into transactionAction(actionId, transactionId, actionType, value) values(1, 1, 1, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(2, 1, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(3, 1, 3, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(4, 2, 1, 1);
insert into transactionAction(actionId, transactionId, actionType, value) values(5, 2, 2, null);
insert into transactionAction(actionId, transactionId, actionType, value) values(6, 2, 3, null);
the query would be
select
distinct
t.*
from
(
select
ta.actionid,
ta.transactionId,
ta.actionType,
ta.value,
/*sum only if action type 1 is different from null*/
sum(case when ta.actionType = 1 and not ta.value is null then 1 else 0 end) over(partition by ta.transactionId) haveActionOne
from
transactionAction ta
) a
inner join transaction t
on a.transactionId = t.transactionId
where
/*indicates if the transaction in its actions has type 1 different from null in its value*/
a.haveActionOne = 1
and a.actionType in (2,3);
Regards,
as an option it would be elegantly to move the condition for action (1) in the Where clause thus this approach eliminates any subqueries.
Select Transaction.transactionid, Max(Transaction.name)
From Transaction Inner Join Transactionaction On (Transaction.transactionid=Transactionaction.transactionid)
Where (actiontype=1 And value Is Not Null) Or actiontype<>1
Group by Transaction.transactionid
Having array_agg(actiontype Order by actiontype) = array[1,2,3]

New ID based on shared attribute among rows

I have a table transactions with columns account_id, device_id and card_id. I'd like to create a new column token defined as user ID. This new column will have same value between transactions that share values of at least one of the 3 columns previously mentioned.
The idea is to create a user identifier that is a mix of account, device and card. This new ID is "stronger" than using any of the previous ones, in the sense that, for instance, if someone changes his account or device but not his card, his transactions will still be linked to the previous ones by this new token id.
Which would be a simple way to do this in Redshift SQL?
Example:
transaction_id account_id device_id card_id token
1 1 1 1 1
2 1 2 2 1
3 2 2 3 1
4 3 3 3 1
5 4 4 4 2
6 5 4 5 2
7 6 8 2 1
8 7 7 7 3
In the above example:
T1 has token 1 since first transaction.
T2 has token 1 since linked to T1 by account.
T3 has token 1 since linked to T2 by device.
T4 has token 1 since linked to T3 by card.
T5 has token 2 since any to the 3 fields share a value with previous transactions (new user).
T6 has token 2 since linked to T5 by device.
T7 has token 1 since linked to T2 by card.
T8 has token 3 since any to the 3 fields share a value with previous transactions (new user)
UPDATE
I managed to get a solution based on 2 steps :
For each transaction, I update the token column with the transaction_id of the oldest transaction linked to any of the 3 IDs (acccount, device or card). This step doesn't solve the problem because for instance, T4 should have token 1, but it has token 2 since the oldest transaction linked to T4 is T2, not T1.
Once each transaction has the oldest transaction linked, I update all the transactions (whose token is not the same transaction ID, just for performance since not needed), and I change its token for the token of the transaction linked. By doing this incrementaly one row at a time and in different SQL transactions, I get the output expected.
Here is the code
create table sandbox.test(
transaction_id integer,
account_id integer,
device_id integer,
card_id integer
);
insert into sandbox.test values
(1, 1, 1, 1),
(2, 1, 2, 2),
(3, 2, 2, 3),
(4, 3, 3, 3),
(5, 4, 4, 4),
(6, 5, 4, 5),
(7, 6, 8, 2),
(8, 7, 7, 7),
(9, 3, 9, 9),
(10, 10, 9, 9);
alter table sandbox.test
add column token int
default NULL;
-- Step 1
update sandbox.test
set token = A.token
from (
with links as (
select transaction_id,
first_value(transaction_id)
over(partition by account_id order by transaction_id rows unbounded preceding) as account_link,
first_value(transaction_id)
over(partition by device_id order by transaction_id rows unbounded preceding) device_link,
first_value(transaction_id)
over(partition by card_id order by transaction_id rows unbounded preceding) card_link
from sandbox.test
)
select l1.transaction_id, least(l2.account_link, l2.device_link, l2.card_link) token
from links l2
inner join links l1
on l2.transaction_id=least(l1.account_link, l1.device_link, l1.card_link)
) A
where sandbox.test.transaction_id=A.transaction_id;
-- Step 2
CREATE OR REPLACE PROCEDURE refresh_transactions() AS $$
DECLARE
transactions RECORD;
BEGIN
FOR transactions IN SELECT transaction_id FROM sandbox.test ORDER BY transaction_id LOOP
RAISE INFO '%', transactions.transaction_id;
EXECUTE 'update
sandbox.test
set token = (select A.token from sandbox.test A where
A.transaction_id = sandbox.test.token)
where transaction_id='||transactions.transaction_id||';';
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
CALL refresh_transactions();
However, the second part of this solution (the refresh_transactions() call) takes too long to run on my DB with millions of transactions, so impossible to implement it. Maybe there is a way to do it more efficiently ?

avoiding group by for column used in datediff?

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)
If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1

SQL Limit number of references to another table without locking

Is there a technique to avoid locking a row but still be able to limit the number of rows in another table that reference it?
For example:
create table accounts (
id integer,
name varchar,
max_users integer
);
create table users (
id integer,
account_id integer,
email varchar
);
If I want to limit the number of users that are part of an account using the max_users value in accounts. Is there another way to ensure that concurrent calls won't create more users than permitted without locking the group row?
Something like this doesn't work, since this happening in two concurrent transactions can have select count(*)... be true even if the count is just at the limit:
begin;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
And the following works, but I'm having performance issues that are mostly based transactions waiting for locks:
begin;
select id from accounts where id = 1 for update;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
EDIT: Bonus question: what if the value is not stored in the database, but is something you can set dynamically?

Running total by grouped records in table

I have a table like this (Oracle, 10)
Account Bookdate Amount
1 20080101 100
1 20080102 101
2 20080102 200
1 20080103 -200
...
What I need is new table grouped by Account order by Account asc and Bookdate asc with a running total field, like this:
Account Bookdate Amount Running_total
1 20080101 100 100
1 20080102 101 201
1 20080103 -200 1
2 20080102 200 200
...
Is there a simple way to do it?
Thanks in advance.
Do you really need the extra table?
You can get that data you need with a simple query, which you can obviously create as a view if you want it to appear like a table.
This will get you the data you are looking for:
select
account, bookdate, amount,
sum(amount) over (partition by account order by bookdate) running_total
from t
/
This will create a view to show you the data as if it were a table:
create or replace view t2
as
select
account, bookdate, amount,
sum(amount) over (partition by account order by bookdate) running_total
from t
/
If you really need the table, do you mean that you need it constantly updated? or just a one off? Obviously if it's a one off you can just "create table as select" using the above query.
Test data I used is:
create table t(account number, bookdate date, amount number);
insert into t(account, bookdate, amount) values (1, to_date('20080101', 'yyyymmdd'), 100);
insert into t(account, bookdate, amount) values (1, to_date('20080102', 'yyyymmdd'), 101);
insert into t(account, bookdate, amount) values (1, to_date('20080103', 'yyyymmdd'), -200);
insert into t(account, bookdate, amount) values (2, to_date('20080102', 'yyyymmdd'), 200);
commit;
edit:
forgot to add; you specified that you wanted the table to be ordered - this doesn't really make sense, and makes me think that you really mean that you wanted the query/view - ordering is a result of the query you execute, not something that's inherant in the table (ignoring Index Organised Tables and the like).
I'll start with this very important caveate: do NOT create a table to hold this data. When you do you will find that you need to maintain it which will become a never ending headache. Write a view to return the extra column if you want to do that. If you're working with a data warehouse then maybe you would do something like this, but even then err on the side of a view unless you simply can't get the performance that you need with indexes,decent hardware, etc.
Here's a query that will return the rows the way that you need them.
SELECT
Account,
Bookdate,
Amount,
(
SELECT SUM(Amount)
FROM My_Table T2
WHERE T2.Account = T1.Account
AND T2.Bookdate <= T1.Bookdate
) AS Running_Total
FROM
My_Table T1
Another possible solution is:
SELECT
T1.Account,
T1.Bookdate,
T1.Amount,
SUM(T2.Amount)
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.Account = T1.Account AND
T2.Bookdate <= T1.Bookdate
GROUP BY
T1.Account,
T1.Bookdate,
T1.Amount
Test them both for performance and see which works better for you. Also, I haven't thoroughly tested them beyond the example which you gave, so be sure to test some edge cases.
Use analytics, just like in your last question:
create table accounts
( account number(10)
, bookdate date
, amount number(10)
);
delete accounts;
insert into accounts values (1,to_date('20080101','yyyymmdd'),100);
insert into accounts values (1,to_date('20080102','yyyymmdd'),101);
insert into accounts values (2,to_date('20080102','yyyymmdd'),200);
insert into accounts values (1,to_date('20080103','yyyymmdd'),-200);
commit;
select account
, bookdate
, amount
, sum(amount) over (partition by account order by bookdate asc) running_total
from accounts
order by account,bookdate asc
/
output:
ACCOUNT BOOKDATE AMOUNT RUNNING_TOTAL
---------- -------- ---------- -------------
1 01-01-08 100 100
1 02-01-08 101 201
1 03-01-08 -200 1
2 02-01-08 200 200