running total starting from a date column - sql

I'm trying to get a running total as of a date. This is the data I have
Date
transaction Amount
End of Week Balance
jan 1
5
100
jan 2
3
100
jan 3
4
100
jan 4
3
100
jan 5
1
100
jan 6
3
100
I would like to find out what the daily end balance is. My thought is to get a running total from each day to the end of the week and subtract it from the end of week balance, like below
Date
transaction Amount
Running total
End of Week Balance
Balance - Running total
jan 1
5
19
100
86
jan 2
3
14
100
89
jan 3
4
11
100
93
jan 4
3
7
100
96
jan 5
1
4
100
97
jan 6
3
3
100
100
I can use
SUM(transactionAmount) OVER (Order by Date)
to get a running total, is there a way to specify that I only want the total of transactions that have taken place after the date?

You can use sum() as a window function, but accumulate in reverse:
select t.*,
(end_of_week_balance -
sum(transactionAmount) over (order by date desc)
)
from t;

If you have this example:
1> select i, sum(i) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
you can also do:
1> select i, sum(case when i>3 then i else 0 end) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 0
2 0
3 0
4 4
5 9
6 15
7 22
8 30
9 39

Related

sql split yearly record into 12 monthly records

I am trying to use common table expression to split an yearly record into 12 monthly records. I have to do it for next 20 years records . That means 20 rows into 600 rows (20*12=600 records).
What is the best way to do it. Can anyone help with an efficient way to do it.
Using a single table as shown below. Year 0 means current year so it should split into remaining months and year=1 means next year onward it should split into 12 (months) records
id year value
1 0 3155174.87
1 1 30423037.3
1 2 35339631.25
expected result should look like this:
Id Year Month Value Calender year
1 0 5 150 2022
1 0 6 150 2022
1 0 7 150 2022
1 0 8 150 2022
1 0 9 150 2022
1 0 10 150 2022
1 0 11 150 2022
1 0 12 150 2022
1 0 1 150 2023
1 0 2 150 2023
1 0 3 150 2023
1 0 4 150 2023
1 1 5 100 2023
1 1 6 100 2023
1 1 7 100 2023
1 1 8 100 2023
1 1 9 100 2023
1 1 10 100 2023
1 1 11 100 2023
1 1 12 100 2023
1 1 1 100 2024
1 1 2 100 2024
1 1 3 100 2024
1 1 4 100 2024
You can simply join onto a list of months, and then use a bit of arithmetic to split the Value
SELECT
t.Id,
t.Year,
v.Month,
Value = t.Value / CASE WHEN t.Year = 0 THEN 13 - MONTH(GETDATE()) ELSE 12 END
FROM YourTable t
JOIN (VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
) v(Month) ON t.year > 0 OR v.Month >= MONTH(GETDATE());
db<>fiddle

Cumulative sum in SQL using window function

QTY
STOCK
RNK
ID KEY
CUM SUM
40
35
1
1
35
20
35
2
1
0
15
35
3
1
0
58
35
4
1
0
18
35
5
1
0
40
35
1
2
35
20
35
2
2
0
15
35
3
2
0
CUM SUM should be MIN(QTY, STOCK-SUM(all rows in cumsum before the current row)) for every other row and for 1st row it should be MIN(QTY, STOCK-SUM(0))=> MIN(QTY,STOCK)
QTY
STOCK
RNK
ID KEY
CUM SUM
40
35
1
1
5
20
35
2
1
-10
15
35
3
1
-30
58
35
4
1
-7
18
35
5
1
-24
40
35
1
2
5
20
35
2
2
-10
15
35
3
2
-30
After, I tried I am getting the above output
SELECT sum(qty-stock) over (
partition by ID KEY
ORDER BY rnk
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as CUM SUM
FROM TABLE
Need to get correct cumsum value using a window function in the existing table
You may use a rolling SUM() here, using SUM() as an analytic function:
SELECT *, SUM(QTY - STOCK) OVER (PARTITION BY ID_KEY ORDER BY RNK) AS CUM_SUM
FROM yourTable
ORDER BY ID_KEY, RNK;

R - get a vector that tells me if a value of another vector is the first appearence or not

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL

SQL Method to report next period value for this period

This may already be answered, but I can't figure out the correct search terms for what I need. We store values by Year / Period for the Beginning of Month (BOM). The BOM for one month is the same value as End of Month (EOM) for the previous month. I need a way to report this as such.
So 2018-02 BOM = 2018-01 EOM.
I thought I might be able to use something simple, but it does not account for the month/year wrap at 12 months as those fields are numerical.
select yr as YEAR, (pd-1) as PERIOD, sum(BOM) as EOM
from Table1
where type = '3'
group by yr, pd
order by yr desc, pd desc
This works for the middle months, but not for January, which becomes 2018-0 instead of 2017-12.
Example Data
Yr Pd Type BOM
18 02 3 100
18 02 3 100
18 02 2 200
18 02 2 100
18 01 3 100
18 01 3 100
18 01 2 200
18 01 2 100
18 01 3 100
18 01 2 300
17 12 3 100
17 12 3 200
17 12 2 300
17 12 3 200
17 12 2 100
17 11 3 300
17 11 2 400
17 11 3 400
17 11 2 100
So the results I am looking for would be:
Yr Pd EOM
18 01 200
17 12 300
17 11 500
17 10 700
I'm working in System iNavigator currently, but hoping to move this into an externally connected Excel query at some point.
Your DB2 database should be able to use CASE WHEN
Which can be used to calculate the year and the month, depending on the month.
For example:
select
CASE WHEN pd = 1 THEN yr - 1 ELSE yr END as Yr,
CASE WHEN pd = 1 THEN 12 ELSE pd - 1 END as Pd,
SUM(BOM) as EOM
from Table1
where type = '3'
group by yr, pd
order by yr desc, pd desc

Using Sum() with multiple where clauses

I'm pretty new to this, so forgive if this has been posted (I had no idea what to even search on).
I have 2 tables, Accounts and Usage
AccountID AccountStartDate AccountEndDate
-------------------------------------------
1 12/1/2012 12/1/2013
2 1/1/2013 1/1/2014
UsageId AccountID EstimatedUsage StartDate EndDate
------------------------------------------------------
1 1 10 1/1 1/31
2 1 11 2/1 2/29
3 1 23 3/1 3/31
4 1 23 4/1 4/30
5 1 15 5/1 5/31
6 1 20 6/1 6/30
7 1 15 7/1 7/31
8 1 12 8/1 8/31
9 1 14 9/1 9/30
10 1 21 10/1 10/31
11 1 27 11/1 11/30
12 1 34 12/1 12/31
13 2 13 1/1 1/31
14 2 13 2/1 2/29
15 2 28 3/1 3/31
16 2 29 4/1 4/30
17 2 31 5/1 5/31
18 2 26 6/1 6/30
19 2 43 7/1 7/31
20 2 32 8/1 8/31
21 2 18 9/1 9/30
22 2 20 10/1 10/31
23 2 47 11/1 11/30
24 2 33 12/1 12/31
I'd like to write one query that gives me estimated usage for each month (starting now until the last month that we serve an account) for all accounts being served during that month.
The results would be as follows:
Month-Year Total Est Usage
------------------------------
Oct-12 0 (none being served)
Nov-12 0 (none being served)
Dec-12 34 (only accountid 1 being served)
Jan-13 23 (accountid 1 & 2 being served)
Feb-13 24 (accountid 1 & 2 being served)
Mar-13 51 (accountid 1 & 2 being served)
...
Dec-13 33 (only accountid 2 being served)
Jan-14 0 (none being served)
Feb-14 0 (none being served)
I'm assuming I need to sum and then do a Group By...but not really sure logically how I'd lay this out.
Revised Answer:
I've created a Months table with columns MonthID, Month with values like (201212, 12), (201301, 1), ...
I've also reorganised the usage table to have a month column rather than the start date and end date, as it makes the idea clearer.
See http://sqlfiddle.com/#!3/f57d84/6 for details
The query is now:
Select
m.MonthID,
Sum(u.EstimatedUsage) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Inner Join
Months m
On m.MonthID Between
Year(a.AccountStartDate) * 100 + Month(a.AccountStartDate) And
Year(a.AccountEndDate) * 100 + Month(a.AccountEndDate) And
m.Month = u.Month
Group By
m.MonthID
Order By
1
Previous answer, for reference which assumed usages ranges were full dates rather than just months.
Select
Year(u.StartDate),
Month(u.StartDate),
Sum(Case When a.AccountStartDate <= u.StartDate And a.AccountEndDate >= u.EndDate Then u.EstimatedUsage Else 0 End) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Group By
Year(u.StartDate),
Month(u.StartDate)
Order By
1, 2