Banking Transactions Transcription
A client requested a query for a dashboard in their online banking web service. It should return a list of all the customer accounts and get their transactions for the current month.
The result should have the following columns: iban/transactions/total
-iban: Account iban
-transactions: list of transaction amount record for a specific account iban:
Record is a transaction amount
Records are separated by a '+' sign
Records are sorted in ascending order of dt
-total: total amount of transactions
The result should be sorted in descending order by total number of transactions, then in descending order by total.
Note:
-Only transactions in the current month should be included in the result.
-The current month is September.
-The ID is INT primary key, Iban is varchar, account_id is INT foreign key (id), dt is datetime, amount is varchar
I am new to this so here is the example I can put in the best for reference:
Accounts
ID iban
1 GT92 GJH2 AYZM
2 MT82 GWLY FWMY
3 GI36 YOPG Y6NQ
Transactions
account_id dt amount
1 2022-08-25 13:59:30 $42.87
1 2022-08-26 19:12:32 $24.04
1 2022-09-05 17:35:29 $70.07
1 2022-09-10 13:09:40 $26.15
1 2022-09-13 16:28:55 $10.15
2 2022-08-26 05:05:38 $82.83
2 2022-09-03 05:12:33 $34.14
2 2022-09-03 17:19:27 $94.94
2 2022-09-04 10:36:07 $69.31
2 2022-09-12 05:15:22 $90.06
2 2022-09-18 14:30:52 $54.85
3 2022-09-25 04:28:37 $45.99
3 2022-08-22 21:12:42 $65.98
3 2022-08-29 04:45:23 $10.99
3 2022-09-02 09:32:25 $98.36
3 2022-09-02 14:58:25 $25.45
3 2022-09-06 21:15:47 $57.98
3 2022-09-10 10:25:26 $37.90
I tried STUFF and XML PATH with money convert to get sum for particular IDs but can not get the results in single query
select iban,
STUFF((select '+' +amount from transactions where account_id = id
for XML PATH('')),1,1,'')[transactions]
from Accounts
order by id;
select
SUM((case when isnumeric([amount])=1 then convert(money,[amount]) else 0 end)) as Transactions from transactions
group by account_id;
Nest the aggregate query and JOIN on ID and account_id fields. This means SELECTing account_id in aggregate query. Also, include record Count in the aggregate query. Include ORDER BY dt in the STUFF() SQL.
SELECT iban, TransTotal,
STUFF((SELECT '+' +amount FROM transactions WHERE account_id = id ORDER BY dt
FOR XML PATH('')),1,1,'')[transactions]
FROM Accounts INNER JOIN (SELECT account_id, Count(account_id) AS TransCount
SUM((CASE WHEN isnumeric([amount])=1 THEN convert(money,[amount]) ELSE 0 END)) AS TransTotal
FROM transactions
GROUP BY account_id) AS T
ON Accounts.id = T.account_id
ORDER BY TransCount DESC, TransTotal DESC;
Add filter criteria for year/month in both STUFF and aggregate query. Format() function is one way.
Format(dt, 'yyyyMM') = Format(GetDate(), 'yyyyMM')
Instead of GetDate(), could use static parameter or input by user.
Another approach:
SELECT iban, Transactions, TransTotal FROM Accounts INNER JOIN
(SELECT account_id, Count(account_id) AS TransCount,
STRING_AGG(amount, '+') WITHIN GROUP (ORDER BY dt) AS Transactions,
SUM((CASE WHEN isnumeric([amount])=1 THEN convert(money,[amount]) ELSE 0 END)) AS TransTotal
FROM transactions
WHERE Format(dt, 'yyyyMM')=Format(GetDate(), 'yyyyMM')
GROUP BY account_id) AS T
ON Accounts.id = T.account_id
ORDER BY TransCount DESC, TransTotal DESC
Related
I'm creating a table with the earliest 3 purchases by customer along with the total count of purchases by said customer, using a CTE. I did this successfully with the query below, but it shows 3 rows for each user with a row for the first purchase date, 2nd purchase date, and 3rd purchase date as separate rows. I'm trying to show the 3 purchase dates as columns, with one row for each user, instead.
This table has hundreds of rows so I can't write the needed user IDs in the code. Any ideas? Is there a way to merge 3 CTEs or write code to spit out the earliest payment date, 2nd earliest, 3rd earliest, and total amount for the user as columns. Current code is below:
WITH cte_2
AS (SELECT customer_id,
payment_date,
Row_number()
OVER (
partition BY customer_id
ORDER BY payment_date ASC) AS purchase_number
FROM payment)
SELECT cte_2.customer_id,
cte_2.payment_date,
cte_2.purchase_number,
Count(payment_id) AS total_payments
FROM payment
INNER JOIN cte_2
ON payment.customer_id = cte_2.customer_id
WHERE purchase_number <= 3
GROUP BY cte_2.customer_id,
cte_2.payment_date,
purchase_number
ORDER BY customer_id ASC
Current Output with above code:
Preferred Output:
Using pandas you can use pivot:
df = df.set_index('customer_id')
pivot_df = df.pivot(columns='purchase_number', values='payment_dates')
# To improve readability of your columns you can add a prefix:
pivot_df = pivot_df.add_prefix('payment_')
pivot_df.merge(df['total_payments'], left_index=True, right_index=True).drop_duplicates()
When using:
df = pd.DataFrame({
'customer_id':[1,1,1,2,2,2,3],
'payment_dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-01'],
'purchase_number':[1,2,3,1,2,3,1],
'total_payments':[4,4,4,26,26,26,1]})
Our result is:
payment_1 payment_2 payment_3 total_payments
customer_id
1 2021-01-01 2021-01-02 2021-01-03 4
2 2021-01-04 2021-01-05 2021-01-06 26
3 2021-01-01 NaN NaN 1
if your sql product supports 'case when' then you can do it with:
WITH
cte_2
AS (SELECT payment_id,
Row_number()
OVER (
partition BY customer_id
ORDER BY payment_date ASC) AS purchase_number
FROM payment)
SELECT pmt.customer_id,
Count(case when cte_2.purchase_number=1 then 1 else null end) as [First Payment],
Count(case when cte_2.purchase_number=2 then 1 else null end) as [2nd Payment],
Count(case when cte_2.purchase_number=3 then 1 else null end) as [3rd Payment],
Count(pmt.payment_id) AS total_payments
FROM payment pmt
LEFT JOIN cte_2
ON pmt.payment_id=cte_2.payment_id
and cte_2.purchase_number <= 3
GROUP BY pmt.customer_id
ORDER BY pmt.customer_id ASC
CTE simply assigns payment numbers for each payment, we then join the payments table to that CTE by payment id using left join, because an inner join would remove payments with payment number > 3 (but we want to count them)
I am doing a cohort analysis using the table TRANSACTIONS. Below is the table schema,
USER_ID NUMBER,
PAYMENT_DATE_UTC DATE,
IS_PAYMENT_ADDED BOOLEAN
Below is a quick query to see how USER_ID 12345 (an example) goes through the different cohorts based on the date filter provided,
WITH RESULT(
SELECT
USER_ID,
TO_DATE(PAYMENT_DATE_UTC) AS PAYMENT_DATE,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY 1,2
HAVING PAYMENT_ADDED_COUNT>=1
ORDER BY 2
)
SELECT
COUNT(DISTINCT r.USER_ID),
SUM(r.PAYMENT_ADDED_COUNT)
FROM RESULT r
WHERE r.USER_ID=12345
AND (r.PAYMENT_DATE>='2021-02-01' AND r.PAYMENT_DATE<'2021-02-15')
The result for this query with the time frame (two weeks) would be
| 1 | 55 |
and this USER_ID would be classified as a Regular User Cohort (one who has made more than 10 payments) for the provided date filter
If the same query is run with the time frame as just one day say '2021-02-07', the result would be
| 1 | 10 |
and this USER_ID would be classified as as Occasional User Cohort (one who has made between 1 and 10 payments) for the provided date filter
I have this below query to bucket the USER_ID's into the two different cohorts based on the sum of the payments added,
WITH
ALL_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
),
OCASSIONAL_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING (PAYMENT_ADDED_COUNT>=1 AND PAYMENT_ADDED_COUNT<=10)
),
REGULAR_USER_COHORT AS
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING PAYMENT_ADDED_COUNT>10
)
SELECT
COUNT(DISTINCT ou.USER_ID) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.USER_ID) AS "REGULAR USERS"
FROM ALL_USER_COHORT au
LEFT JOIN OCASSIONAL_USER_COHORT ou ON au.USER_ID=ou.USER_ID
LEFT JOIN REGULAR_USER_COHORT ru ON au.USER_ID=ru.USER_ID
LEFT JOIN TRANSACTIONS t ON au.USER_ID=t.USER_ID
WHERE au.USER_ID=12345
AND TO_DATE(t.PAYMENT_DATE_UTC)>='2021-02-07'
Ideally the USER_ID 12345 should be bucketed as "OCCASIONAL USERS" as per the provided date filter but the query buckets it as "REGULAR USERS" instead.
For starters you CTE could have the redundancy removed like so:
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
GROUP BY user_id
), ocassional_user_cohort AS (
SELECT * FROM all_user_cohort
WHERE PAYMENT_ADDED_COUNT between 1 AND 10
), regular_user_cohort AS (
SELECT * FROM all_user_cohort
WHERE PAYMENT_ADDED_COUNT > 10
)
SELECT
COUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.user_id) AS "REGULAR USERS"
FROM all_user_cohort AS au
LEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_id
LEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_id
LEFT JOIN transactions t ON au.user_id=t.user_id
WHERE au.user_id=12345
AND TO_DATE(t.payment_date_utc)>='2021-03-01'
But the reason you are getting this problem is you are doing the which do the belong in across all time.
What you are wanting is to move the date filter into all_user_cohort, and not making tables when you can just sum the number of rows meeting the need.
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
WHERE TO_DATE(payment_date_utc)>='2021-03-01'
GROUP BY user_id
)
SELECT
SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS"
SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS"
FROM transactions
WHERE au.user_id=12345
Which can also be done differently, if that is more what your looking for, for other reasons.
WITH all_user_cohort AS (
SELECT
USER_ID,
SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
FROM transactions
WHERE TO_DATE(payment_date_utc)>='2021-03-01'
GROUP BY user_id
), classify_users AS (
SELECT user_id
,CASE
WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS'
WHEN payment_added_count > 10 THEN 'REGULAR USERS'
ELSE 'users with zero payments'
END AS classified
FROM all_user_cohort
)
SELECT classified
,count(*)
FROM classify_users
WHERE user_id=12345
GROUP BY 1
I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates
I want to count the number of transactions for the first 30 days from an account's creation for all accounts. The issue is not all accounts were created at the same time.
Example: [Acct_createdTable]
Acct Created_date
909099 01/02/2015
878787 02/03/2003
676767 09/03/2013
I can't Declare a datetime variable since it can only take one datetime.
and I can't do :
Select acctnumber,min,count(*)
from transaction_table
where transactiondate between (
select Created_date from Acct_createdTable where Acct = 909099)
and (
select Created_date from Acct_createdTable where Acct = 909099)+30
Since then it'll only count the number of transaction for only one acct.
What I want for my output is.
Acct First_30_days_count
909099 23
878787 190
676767 23
I think what you're looking for is a basic GROUP BY query.
SELECT
ac.acctnumber,
COUNT(td.id)
FROM Acct_createdTable ac
LEFT JOIN transactiondate td ON
td.acct = ac.acctnumber
AND
td.transaction_date BETWEEN ac.create_date AND DATEADD(30, DAY, ac.create_date)
GROUP BY
ac.acctnumber
This should return number of transactions within first 30 days for each account. This of course is pseudocode as you didn't state your database platform. The left join will ensure that accounts with no transactions in that period will get displayed.
An alternative solution would be to use outer apply like this:
select a.acct, o.First_30_days_count
from acct_createdtable a
outer apply (
select count(*) First_30_days_count
from transaction_table
where acctnumber = a.acct
and transactiondate between a.created_date and dateadd(day, 30, a.created_date)
) o;
I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out