Postgres Bank Account Transaction Balance - sql

Here's an example "transactions" table where each row is a record of an amount and the date of the transaction.
+--------+------------+
| amount | date |
+--------+------------+
| 1000 | 2020-01-06 |
| -10 | 2020-01-14 |
| -75 | 2020-01-20 |
| -5 | 2020-01-25 |
| -4 | 2020-01-29 |
| 2000 | 2020-03-10 |
| -75 | 2020-03-12 |
| -20 | 2020-03-15 |
| 40 | 2020-03-15 |
| -50 | 2020-03-17 |
| 200 | 2020-10-10 |
| -200 | 2020-10-10 |
+--------+------------+
The goal is to return one column "balance" with the balance of all transactions. Only catch is that there is a monthly fee of $5 for each month that there are not at least THREE payment transactions (represented by a negative value in the amount column) that total at least $100. So in the example, the only month where you wouldn't have a $5 fee is March because there were 3 payments (negative amount transactions) that totaled $145. So the final balance would be $2,746. The sum of the amounts is $2,801 minus the $55 monthly fees (11 months X 5). I'm not a postgres expert by any means, so if anyone has any pointers on how to get started solving this problem or what parts of the postgres documentation which help me most with this problem that would be much appreciated.
The expected output would be:
+---------+
| balance |
+---------+
| 2746 |
+---------+

This is rather complicated. You can calculate the total span of months and then subtract out the one where the fee is cancelled:
select amount, (extract(year from age) * 12 + extract(month from age)), cnt,
amount - 5 *( extract(year from age) * 12 + extract(month from age) + 1 - cnt) as balance
from (select sum(amount) as amount,
age(max(date), min(date)) as age
from transactions t
) t cross join
(select count(*) as cnt
from (select date_trunc('month', date) as yyyymm, count(*) as cnt, sum(amount) as amount
from transactions t
where amount < 0
group by yyyymm
having count(*) >= 3 and sum(amount) < -100
) tt
) tt;
Here is a db<>fiddle.
This calculates 2756, which appears to follow your rules. If you want the full year, you can just use 12 instead of the calculating using the age().

I would first left join with a generate_series that represents the months you are interested in (in this case, all in the year 2020). That adds the missing months with a balance of 0.
Then I aggregate these values per month and add the negative balance per month and the number of negative balances.
Finally, I calculate the grand total and subtract the fee for each month that does not meet the criteria.
SELECT sum(amount_per_month) -
sum(5) FILTER (WHERE negative_per_month > -100 OR negative_count < 3)
FROM (SELECT sum(amount) AS amount_per_month,
sum(amount) FILTER (WHERE amount < 0) AS negative_per_month,
month_start,
count(*) FILTER (WHERE amount < 0) AS negative_count
FROM (SELECT coalesce(t.amount, 0) AS amount,
coalesce(date_trunc('month', CAST (t.day AS timestamp)), dates.d) AS month_start
FROM generate_series(
TIMESTAMP '2020-01-01',
TIMESTAMP '2020-12-01',
INTERVAL '1 month'
) AS dates (d)
LEFT JOIN transactions AS t
ON dates.d = date_trunc('month', CAST (t.day AS timestamp))
) AS gaps_filled
GROUP BY month_start
) AS sums_per_month;

This would be my solution by simply using cte.
DB fiddle here.
balance
2746
Code:
WITH monthly_credited_transactions
AS (SELECT Date_part('month', date) AS cred_month,
Sum(CASE
WHEN amount < 0 THEN Abs(amount)
ELSE 0
END) AS credited_amount,
Sum(CASE
WHEN amount < 0 THEN 1
ELSE 0
END) AS credited_cnt
FROM transactions
GROUP BY 1),
credit_fee
AS (SELECT ( 12 - Count(1) ) * 5 AS fee,
1 AS id
FROM monthly_credited_transactions
WHERE credited_amount >= 100
AND credited_cnt >= 3),
trans
AS (SELECT Sum(amount) AS amount,
1 AS id
FROM transactions)
SELECT amount - fee AS balance
FROM trans a
LEFT JOIN credit_fee b
ON a.id = b.id

For me the below query worked (have adopted my answer from #GordonLinoff):
select CAST(totalamount - 5 *(12 - extract(month from firstt) + 1 - nofeemonths) AS int) as balance
from (select sum(amount) as totalamount, min(date) as firstt
from transactions t
) t cross join
(select count(*) as nofeemonths
from (select date_trunc('month', date) as months, count(*) as nofeemonths, sum(amount) as totalamount
from transactions t
where amount < 0
group by months
having count(*) >= 3 and sum(amount) < -100
) tt
) tt;
The firstt is the date of first transaction in that year and 12 - extract(month from firstt) + 1 - nofeemonths are the number of months for which the credit card fees of 5 will be charged.

Related

SQL/DB2 getting single row of results per employee with a UNION

I'm currently using a UNION on 2 select statements and while I'm getting the correct data, it's not exactly what I actually need when it comes time to use it in a front-end view
I'm currently using this query:
SELECT
T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select
T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
order by data desc;
with these dummy tables as examples and getting the following result:
order_list
employee | order_count | quant | order_date
--------------------------------------------------
123 | 5 | 1 | 2022-03-02
456 | 1 | 5 | 2022-03-02
sales
employee | price | quant | order_date
--------------------------------------------------
123 | 500 | 1 | 2022-03-02
456 | 1000 | 1 | 2022-03-02
Result
employee | type | data
------------------------------------------
123 Orders 1
123 Sales 500
456 Orders 5
456 Sales 1000
Is there a way to use a UNION but alter it so that I can instead get a single row for each employee and just get rid of the type/data columns and instead set each piece of data to the desired column (the type would instead be the column name ) like so:
Desired Result
employee | Orders | Sales
---------------------------------
123 | 1 | 500
456 | 5 | 1000
Try adding an outer query:
select employee,
MAX(case when type=Orders then data end) as orders ,
MAX(case when type=Sales then data end) as Sales
from (
SELECT T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
) as t1
GROUP BY employee;
Note that I removed order by data desc it has no effect inside the union
You can join tables through employee columns such as
SELECT o.employee,
SUM(CASE
WHEN o.order_count < o.quant THEN
o.order_count
ELSE
o.quant
END) AS Orders,
SUM(s.price * s.quant) AS Sales
FROM schemaOne.order_list o
JOIN schemaOne.sales s
ON s.employee = o.employee
AND s.sales_date = o.order_date
WHERE o.order_date > current_date - 35 DAYS
GROUP BY o.employee

How to get daily budget based on monthly budget and workings days

Have have 2 tables.
One table with month budget, and one table with workings days.
What I want, is find out daily budget based on the monthly budget and working days.
Example:
August have a budget on 1000 and have 21 workings day.
September have a budget on 2000 and 23 workings days
I want to figure out what the total budget betweens two dates.
Ex: between 2020-08-02 and 2020-09-15
But must be sure that, days in august takes budget from august, days from september takes budget from september etc.
tbBudget:
Date | Amount
2020-08-01 | 1000
2020-09-01 | 2000
2020-10-01 | 3000
tbWorkingDays
Date | WorkingDay
2020-08-01 | 0
2020-08-02 | 0
2020-08-03 | 1
2020-08-04 | 1
2020-08-05 | 1
2020-08-06 | 1
2020-08-07 | 1
2020-08-08 | 1
...
2020-09-01 | 1
2020-09-02 | 1
2020-09-03 | 0
2020-09-04 | 1
...
2020-10-01 | 1
2020-10-02 | 0
2020-10-03 | 1
2020-10-04 | 1
I have no idea how to solve this issue. Can you help me?
My result should be like:
Date | WorkingDay | BudgetAmount
2020-08-02 | 0 | 0.0
2020-08-03 | 1 | 47.6
2020-08-04 | 1 | 47.6
2020-08-05 | 1 | 47.6
..
2020-09-13 | 1 | 86.9
2020-09-14 | 1 | 86.9
2020-09-15 | 1 | 86.9
Using CTE and group by:
with CTE1 AS(
SELECT FORMAT(A.DATE, 'MMyyyy') DATE, B.AMOUNT, SUM(CASE WHEN [WorkingDay] = 1 THEN 1 ELSE 0 END) AS TOTAL_WORKING_DAYS
FROM tbWorkingDays A INNER JOIN tbBudget B
ON (FORMAT(A.DATE, 'MMyyyy') = FORMAT(B.DATE, 'MMyyyy')) GROUP BY FORMAT(A.[DATE], 'MMyyyy'), B.AMOUNT
)
SELECT A.DATE,
A.WORKINGDAY,
CASE WHEN A.WORKINGDAY = 1 THEN B.AMOUNT/B.TOTAL_WORKING_DAYS
ELSE 0 END AS BudgetAmount
FROM CTE1 B
INNER JOIN
tbWorkingDays A
ON (FORMAT(A.DATE, 'MMyyyy') = B.DATE);
Assuming that the budgets are by month:
select wd.*,
(case when workingday = 0 then 0
else wd.budget * 1.0 / sum(wd.workingday) over (partition by wd.date)
end) as daily_amount
from tbWorkingDays wd join
tblBudget b
on wd.date >= b.date and wd.date < dateadd(month, 1, wd.date);
If the budget dates are not per month, then use apply instead:
select wd.*,
(case when workingday = 0 then 0
else wd.budget * 1.0 / sum(wd.workingday) over (partition by wd.date)
end) as daily_amount
from tbWorkingDays wd cross apply
(select top (1) b.*
from tblBudget b
where wd.date >= b.date
order by b.date desc
) b
Use sum as an analytical function to get the number of workingdays pr month, then divide out
Here is a functioning solution
with tally as
(
SELECT
row_number() over (order by (select null))-1 n
from (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) a(a)
cross join (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) b(b)
cross join (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) c(c)
)
, tbWorkingDays as
(
select
cast(dateadd(day,n,'2020-01-01') as date) [Date],
iif(DATEPART(WEEKDAY,cast(dateadd(day,n,'2020-01-01') as date)) in (1,7),0,1) WorkingDay
from tally
where n<365
)
, tbBudget AS
(
select * from
(values
(cast('2020-08-01' as date), cast(1000 as decimal(19,2)))
,(cast('2020-09-01' as date), cast(2000as decimal(19,2)))
,(cast('2020-10-01' as date), cast(3000as decimal(19,2)))
) a([Date],[Amount])
)
select
a.[Date]
,a.WorkingDay*
(b.Amount/
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date)))
from tbWorkingDays a
inner join tbBudget b
on a.Date between b.Date and dateadd(day,-1,dateadd(month,1,b.date))
The work is done here:
select
a.[Date]
,a.WorkingDay*
(b.Amount/
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date)))
from tbWorkingDays a
inner join tbBudget b
on a.Date between b.Date and dateadd(day,-1,dateadd(month,1,b.date))
The expression
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date))
Sums the number of workingdays for the current month. I then join against the budget and take the sum for the month and divide by the expression above.
To make sure there only is budget on workingdays, I simply multiply by "workingday", since 0 is a non workingday, the sum will be 0 for all non workingdays.

Find the customers and other metrics based on the purchase frequency new & repeat

I am trying to find the customer count and sales by the type of customer (New and Returning) and the number of times they have purchased.
txn_date Customer_ID Transaction_Number Sales Reference(not in the SQL table) customer type (not in the sql table)
1/2/2019 1 12345 $10 Second Purchase SLS Repeat
4/3/2018 1 65890 $20 First Purchase SLS Repeat
3/22/2019 3 64453 $30 First Purchase SLS new
4/3/2019 4 88567 $20 First Purchase SLS new
5/21/2019 4 85446 $15 Second Purchase SLS new
1/23/2018 5 89464 $40 First Purchase SLS Repeat
4/3/2019 5 99674 $30 Second Purchase SLS Repeat
4/3/2019 6 32224 $20 Second Purchase SLS Repeat
1/23/2018 6 46466 $30 First Purchase SLS Repeat
1/20/2018 7 56558 $30 First Purchase SLS new
I am using the below code to get the aggregate sales and customer count for the total customers:
select seqnum, count(distinct customer_id), sum(sales) from (
select co.*,
row_number() over (partition by customer_id order by txn_date) as seqnum
from somya co)
group by seqnum
order by seqnum;
I want to get the same data by the customer type:
for example for the new customers my result should show:
New Customers Customer_Count Sum(Sales)
1st Purchase 3 $80
2nd Purchase 1 $15
Returning Customers Customer_Count Sum(Sales)
1st Purchase 3 $90
2nd Purchase 3 $60
I am trying the below query to get the data for new and repeat customers:
New Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE between '01-JAN-19' and '31-DEC-19'
group by seqnum
order by seqnum asc;
Returning Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE <'01-JAN-19'
group by seqnum
order by seqnum asc;
I am not able to figure out what is wrong with my query or if there is a problem with my logic.
This is just a sample data, I have transactions from all the years in my data base so I need to narrow the transaction date in the query but as soon as I narrowing down the data using the transaction date the repeat customer query doesnt give me anything and the new customer query gives me the total customer for that period.
If I understand correctly, you need to know the first time someone becomes a customer. And then use this:
select (case when first_year < 2019 then 'returning' else 'new' end) as custtype,
seqnum, count(*), sum(sales)
from (select co.*,
row_number() over (partition by customer_id, extract(year from txn_date) order by txn_date) as seqnum,
min(extract(year from txn_date)) over (partition by customer_id) as first_year
from somya co
) s
where txn_date >= date '2019-01-01' and
txn_date < date '2020-01-01'
group by (case when first_year < 2019 then 'returning' else 'new' end),
seqnum
order by custtype, seqnum;
You can categorize your sales data to assign a customer type and a purchase sequence using windowing functions, like this:
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat'
ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd
+-----------+-------------+--------------------+-------+---------------+-------------------+
| TXN_DATE | CUSTOMER_ID | TRANSACTION_NUMBER | SALES | CUSTOMER_TYPE | PURCHASE_SEQUENCE |
+-----------+-------------+--------------------+-------+---------------+-------------------+
| 03-APR-18 | 1 | 65890 | 20 | Repeat | 1 |
| 02-JAN-19 | 1 | 12345 | 10 | Repeat | 2 |
| 22-MAR-19 | 3 | 64453 | 30 | New | 1 |
| 03-APR-19 | 4 | 88567 | 20 | New | 1 |
| 21-MAY-19 | 4 | 85446 | 15 | New | 2 |
| 23-JAN-18 | 5 | 89464 | 40 | Repeat | 1 |
| 03-APR-19 | 5 | 99674 | 30 | Repeat | 2 |
| 23-JAN-18 | 6 | 46466 | 30 | Repeat | 1 |
| 03-APR-19 | 6 | 32224 | 20 | Repeat | 2 |
| 20-JAN-18 | 7 | 56558 | 30 | New | 1 |
+-----------+-------------+--------------------+-------+---------------+-------------------+
Then, you can wrap that in a common table expression (aka "WITH" clause) and summarize by the customer type and purchase sequence:
WITH categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
+---------------+-------------------+----------+------------+
| CUSTOMER_TYPE | PURCHASE_SEQUENCE | COUNT(*) | SUM(SALES) |
+---------------+-------------------+----------+------------+
| New | 1 | 3 | 80 |
| New | 2 | 1 | 15 |
| Repeat | 1 | 3 | 90 |
| Repeat | 2 | 3 | 60 |
+---------------+-------------------+----------+------------+
Here's a full SQL with test data:
with sales_data (txn_date, Customer_ID, Transaction_Number, Sales ) as (
SELECT TO_DATE('1/2/2019','MM/DD/YYYY'), 1, 12345, 10 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2018','MM/DD/YYYY'), 1, 65890, 20 FROM DUAL UNION ALL
SELECT TO_DATE('3/22/2019','MM/DD/YYYY'), 3, 64453, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 4, 88567, 20 FROM DUAL UNION ALL
SELECT TO_DATE('5/21/2019','MM/DD/YYYY'), 4, 85446, 15 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 5, 89464, 40 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 5, 99674, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 6, 32224, 20 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 6, 46466, 30 FROM DUAL UNION ALL
SELECT TO_DATE('1/20/2018','MM/DD/YYYY'), 7, 56558, 30 FROM DUAL ),
-- Query starts here
/* WITH */ categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
Response to comment from OP
all the customers whose first purchase date is in 2019 would be a new customer. Any customer who has transacted in 2019 but their first purchase date is before 2019 would be a repeat customer
So, change
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat' ELSE 'New' END customer_type
to
case when min(txn_date) over ( partition by customer_id )
BETWEEN DATE '2019-01-01' AND DATE '2020-01-01' - INTERVAL '1' SECOND
THEN 'New' ELSE 'Repeat' END customer_type
i.e., if and only if a customer's first purchase was in 2019 then they are "new".

Get last known record per month in BigQuery

Account balance collection, that shows the account balance of a customer at a given day:
+---------------+---------+------------+
| customer_id | value | timestamp |
+---------------+---------+------------+
| 1 | -500 | 2019-10-12 |
| 1 | -300 | 2019-10-11 |
| 1 | -200 | 2019-10-10 |
| 1 | 0 | 2019-10-09 |
| 2 | 200 | 2019-09-10 |
| 1 | 600 | 2019-09-02 |
+---------------+---------+------------+
Notice, that customer #2 had no updates to his account balance in October.
I want to get the last account balance per customer per month. If there has been no account balance update for a customer in a given month, the last known account balance should be transferred to the current month. The result should look like that:
+---------------+---------+------------+
| customer_id | value | timestamp |
+---------------+---------+------------+
| 1 | -500 | 2019-10-12 |
| 2 | 200 | 2019-10-10 |
| 2 | 200 | 2019-09-10 |
| 1 | 600 | 2019-09-02 |
+---------------+---------+------------+
Since the account balance of customer #2 was not updated in October but in September, we create a copy of the row from September changing the date to October. Any ideas how to achieve this in BigQuery?
Below is for BigQuery Standard SQL
#standardSQL
WITH customers AS (
SELECT DISTINCT customer_id FROM `project.dataset.table`
), months AS (
SELECT month FROM (
SELECT DATE_TRUNC(MIN(timestamp), MONTH) min_month, DATE_TRUNC(MAX(timestamp), MONTH) max_month
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_month, max_month, INTERVAL 1 MONTH)) month
)
SELECT customer_id,
IFNULL(value, LEAD(value) OVER(win)) value,
IFNULL(timestamp, DATE_ADD(LEAD(timestamp) OVER(win), INTERVAL DATE_DIFF(month, LEAD(month) OVER(win), MONTH) MONTH)) timestamp
FROM months, customers
LEFT JOIN (
SELECT DATE_TRUNC(timestamp, MONTH) month, customer_id,
ARRAY_AGG(STRUCT(value, timestamp) ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
GROUP BY month, customer_id
) USING(month, customer_id)
WINDOW win AS (PARTITION BY customer_id ORDER BY month DESC)
if to apply to sample data from your question - as it is in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 customer_id, -500 value, DATE '2019-10-12' timestamp UNION ALL
SELECT 1, -300, '2019-10-11' UNION ALL
SELECT 1, -200, '2019-10-10' UNION ALL
SELECT 2, 200, '2019-09-10' UNION ALL
SELECT 2, 100, '2019-08-11' UNION ALL
SELECT 2, 50, '2019-07-12' UNION ALL
SELECT 1, 600, '2019-09-02'
), customers AS (
SELECT DISTINCT customer_id FROM `project.dataset.table`
), months AS (
SELECT month FROM (
SELECT DATE_TRUNC(MIN(timestamp), MONTH) min_month, DATE_TRUNC(MAX(timestamp), MONTH) max_month
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_month, max_month, INTERVAL 1 MONTH)) month
)
SELECT customer_id,
IFNULL(value, LEAD(value) OVER(win)) value,
IFNULL(timestamp, DATE_ADD(LEAD(timestamp) OVER(win), INTERVAL DATE_DIFF(month, LEAD(month) OVER(win), MONTH) MONTH)) timestamp
FROM months, customers
LEFT JOIN (
SELECT DATE_TRUNC(timestamp, MONTH) month, customer_id,
ARRAY_AGG(STRUCT(value, timestamp) ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table`
GROUP BY month, customer_id
) USING(month, customer_id)
WINDOW win AS (PARTITION BY customer_id ORDER BY month DESC)
-- ORDER BY month DESC, customer_id
result is
Row customer_id value timestamp
1 1 -500 2019-10-12
2 2 200 2019-10-10
3 1 600 2019-09-02
4 2 200 2019-09-10
5 1 null null
6 2 100 2019-08-11
7 1 null null
8 2 50 2019-07-12
The following query should mostly answer your question by creating a 'month-end' record for each customer for every month and getting the most recent balance:
with
-- Generate a set of months
month_begins as (
select dt from unnest(generate_date_array('2019-01-01','2019-12-01', interval 1 month)) dt
),
-- Get the month ends
month_ends as (
select date_sub(date_add(dt, interval 1 month), interval 1 day) as month_end_date from month_begins
),
-- Cross Join and group so we get 1 customer record for every month to account for
-- situations where customer doesn't change balance in a month
user_month_ends as (
select
customer_id,
month_end_date
from `project.dataset.table`
cross join month_ends
group by 1,2
),
-- Fan out so for each month end, you get all balances prior to month end for each customer
values_prior_to_month_end as (
select
customer_id,
value,
timestamp,
month_end_date
from `project.dataset.table`
inner join user_month_ends using(customer_id)
where timestamp <= month_end_date
),
-- Order by most recent balance before month end, even if it was more than 1+ months ago
ordered as (
select
*,
row_number() over (partition by customer_id, month_end_date order by timestamp desc) as my_row
from values_prior_to_month_end
),
-- Finally, select only the most recent record for each customer per month
final as (
select
* except(my_row)
from ordered
where my_row = 1
)
select * from final
order by customer_id, month_end_date desc
A few caveats:
I did not order results to match your desired result set, and I also kept a month-end date to illustrate the concept. You can easily change the ordering and exclude unneeded fields.
In the month_begins CTE, I set a range of months into the future, so your result set will contain the most recent balance of 'future months'. To make this a bit prettier, consider changing '2019-12-01' to 'current_date()' and your query will always return to the end of the current month.
Your timestamp field looks to be dates, so I used date logic, but you should be able to apply the same principles to use timestamp logic if your underlying fields are actual timestamps.
In your result set, I'm not sure why your 2nd row (customer 2) would have a timestamp of '2019-10-10', that seems arbitrary as customer 2 has no 2nd balance record.
I purposefully split the logic into several CTEs so I could comment on each step easier, you could definitely perform several steps in the same code block for a more condensed query.

Combine data from a table to one row T-SQL

I have a table in #SQL server 2008 that has transaction data. The table looks like this. I would like to have this in a sql statement.
TransactionId|TransactionDate|TransactionType|Amount|Balance|UserId
The transaction type can be one of four types, Deposit, Withdrawals, Profit and Stake. I give an example how it can look like in the transaction table. The balance is the Sum of amount column.
TransactionId|TransactionDate|TransactionType|Amount|Balance|UserId
1| 2013-03-25| Deposit| 150| 150| 1
2| 2013-03-27| Stake| -20| 130| 1
3| 2013-03-28| Profit | 1500| 1630| 1
4 | 2013-03-29| Withdrawals| -700| 930| 1
5| 2013-03-29| Stake | -230 | 700 | 1
6| 2013-04-04| Stake| -150 | 550| 1
7| 2013-04-06| Stake | -150 | 400| 1
What I want now is to get a select statement that gives me all data grouped by week. The result should look like this.
Week|Deposit|Withdrawals|Stake|Profit|Balance|Year
13 | 150| -700 | -250 | 1500 | 700 | 2013
14 | 0 | 0 | -300| 0 | 400 | 2013
I have also problem with the weeks... I live in Europe an my first day in a week is monday. I have a solution for that but around the end of a year I get sometimes week 54 but there are only 52 weeks in a year...
I hope someone can help me out.
This is what I have so far.
SELECT transactionid,
transactiondate,
transactiontype,
amount,
(SELECT Sum(amount)
FROM transactions AS trans_
WHERE trans_.transactiondate <= trans.transactiondate
AND userid = 1) AS Balance,
userid,
Datepart(week, transactiondate) AS Week,
Datepart(year, transactiondate) AS Year
FROM transactions trans
WHERE userid = 1
ORDER BY transactiondate DESC,
transactionid DESC
Here's sample data and my query on sql-fiddle: http://www.sqlfiddle.com/#!3/79d65/92/0
In order to transform the data from the rows into columns, you will want to use the PIVOT function.
You did not specify what balance value you want to return but based on the final result, it looks like you want the final balance to be the value associated with the last transaction date for each day. If that is not correct, then please clarify what the logic should be.
In order to get the result you will want to use the DATEPART and YEAR functions. These will allow grouping by both the week and year values.
The following query should get the result that you want:
select week,
coalesce(Deposit, 0) Deposit,
coalesce(Withdrawals, 0) Withdrawals,
coalesce(Stake, 0) Stake,
coalesce(Profit, 0) Profit,
Balance,
Year
from
(
select datepart(week, t1.transactiondate) week,
t1.transactiontype,
t2.balance,
t1.amount,
year(t1.transactiondate) year
from transactions t1
cross apply
(
select top 1 balance
from transactions t2
where datepart(week, t1.transactiondate) = datepart(week, t2.transactiondate)
and year(t1.transactiondate) = year(t2.transactiondate)
and t1.userid = t2.userid
order by TransactionId desc
) t2
) d
pivot
(
sum(amount)
for transactiontype in (Deposit, Withdrawals, Stake, Profit)
) piv;
See SQL Fiddle with Demo. The result is:
| WEEK | DEPOSIT | WITHDRAWALS | STAKE | PROFIT | BALANCE | YEAR |
------------------------------------------------------------------
| 13 | 150 | -700 | -250 | 1500 | 700 | 2013 |
| 14 | 0 | 0 | -300 | 0 | 400 | 2013 |
As a side note, you stated that your start of the week is Monday, you might have to use the DATEFIRST function to set the first day of the week.
Another option, without using PIVOT, but rather with few CASEs
WITH CTE AS
(
SELECT
TransactionId
,TransactionDate
,DATEPART(WEEK, TransactionDate) AS Week
,CASE WHEN TransactionType='Deposit' THEN Amount ELSE 0 END AS Deposit
,CASE WHEN TransactionType='Stake' THEN Amount ELSE 0 END AS Stake
,CASE WHEN TransactionType='Profit' THEN Amount ELSE 0 END AS Profit
,CASE WHEN TransactionType='Withdrawals' THEN Amount ELSE 0 END AS Withdrawals
,Balance
,DATEPART(YEAR, TransactionDate) AS Year
FROM dbo.Transactions
)
SELECT
Week, SUM(Deposit) AS Deposit, SUM(Withdrawals) AS Withdrawals, SUM(Stake) AS Stake, SUM(Profit) AS Profit,
(SELECT Balance FROM CTE i WHERE i.TransactionID = MAX(o.TransactionID)) AS BAlance, Year
FROM CTE o
GROUP BY Week, Year
SQLFiddle Demo
http://www.sqlfiddle.com/#!3/79d65/89
;WITH cte AS
(
SELECT datepart(ww, transactiondate) wk,
sum(CASE WHEN TransactionType = 'Deposit' THEN Amount ELSE 0 END) AS D,
sum(CASE WHEN TransactionType = 'Withdrawals' THEN Amount ELSE 0 END) AS W,
sum(CASE WHEN TransactionType = 'Profit' THEN Amount ELSE 0 END) AS P,
sum(CASE WHEN TransactionType = 'Stake' THEN Amount ELSE 0 END) AS S,
sum(
CASE WHEN TransactionType = 'Deposit' THEN Amount ELSE 0 END +
CASE WHEN TransactionType = 'Withdrawals' THEN Amount ELSE 0 END +
CASE WHEN TransactionType = 'Profit' THEN Amount ELSE 0 END +
CASE WHEN TransactionType = 'Stake' THEN Amount ELSE 0 END +
CASE WHEN TransactionType = 'Balance' THEN Amount ELSE 0 END) AS wkTotal
FROM transactions
GROUP BY datepart(ww, transactiondate)),
cte1 AS
(
SELECT *, row_number() over (ORDER BY wk) AS rowNum
FROM cte)
SELECT wk, d, w, p, s, wktotal
+ coalesce((SELECT top 1 wktotal FROM cte1 x WHERE x.rownum < m.rownum ), 0) AS RunningBalance
FROM cte1 m