sql split yearly record into 12 monthly records - sql

I am trying to use common table expression to split an yearly record into 12 monthly records. I have to do it for next 20 years records . That means 20 rows into 600 rows (20*12=600 records).
What is the best way to do it. Can anyone help with an efficient way to do it.
Using a single table as shown below. Year 0 means current year so it should split into remaining months and year=1 means next year onward it should split into 12 (months) records
id year value
1 0 3155174.87
1 1 30423037.3
1 2 35339631.25
expected result should look like this:
Id Year Month Value Calender year
1 0 5 150 2022
1 0 6 150 2022
1 0 7 150 2022
1 0 8 150 2022
1 0 9 150 2022
1 0 10 150 2022
1 0 11 150 2022
1 0 12 150 2022
1 0 1 150 2023
1 0 2 150 2023
1 0 3 150 2023
1 0 4 150 2023
1 1 5 100 2023
1 1 6 100 2023
1 1 7 100 2023
1 1 8 100 2023
1 1 9 100 2023
1 1 10 100 2023
1 1 11 100 2023
1 1 12 100 2023
1 1 1 100 2024
1 1 2 100 2024
1 1 3 100 2024
1 1 4 100 2024

You can simply join onto a list of months, and then use a bit of arithmetic to split the Value
SELECT
t.Id,
t.Year,
v.Month,
Value = t.Value / CASE WHEN t.Year = 0 THEN 13 - MONTH(GETDATE()) ELSE 12 END
FROM YourTable t
JOIN (VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
) v(Month) ON t.year > 0 OR v.Month >= MONTH(GETDATE());
db<>fiddle

Related

Loop Record in Oracle

I have a Table called TaxAmount. It has 3 columns(ID, Year, Amount). refer the below image.
I want to divide each row into 12 months. I attached a sample image below.
I'm new in Oracle side. please help me to write a Oracle Query to display the above result.
I tried ROWNUM. But No luck.
Here's one option:
SQL> select id, year, column_value as month, amount
2 from taxamount cross join
3 table(cast(multiset(select level from dual
4 connect by level <= 12
5 ) as sys.odcinumberlist))
6 order by id, year, month;
ID YEAR MONTH AMOUNT
---------- ---------- ---------- ----------
1 2022 1 100
1 2022 2 100
1 2022 3 100
1 2022 4 100
1 2022 5 100
1 2022 6 100
1 2022 7 100
1 2022 8 100
1 2022 9 100
1 2022 10 100
1 2022 11 100
1 2022 12 100
2 2022 1 200
2 2022 2 200
2 2022 3 200
2 2022 4 200
2 2022 5 200
2 2022 6 200
2 2022 7 200
2 2022 8 200
2 2022 9 200
2 2022 10 200
2 2022 11 200
2 2022 12 200
3 2022 1 150
3 2022 2 150
3 2022 3 150
3 2022 4 150
3 2022 5 150
3 2022 6 150
3 2022 7 150
3 2022 8 150
3 2022 9 150
3 2022 10 150
3 2022 11 150
3 2022 12 150
36 rows selected.
SQL>

Pandas groupby and filtering groups with rows greater than N rows

I have a pandas df of the following format
STOCK YR MONTH DAY PRICE
AAA 2022 1 1 10
AAA 2022 1 2 11
AAA 2022 1 3 10
AAA 2022 1 4 15
AAA 2022 1 5 10
BBB 2022 1 1 5
BBB 2022 1 2 10
BBB 2022 2 1 10
BBB 2022 2 2 15
What I am looking to do is to filter this df such that I am grouping by STOCK and YR and MONTH and selecting the groups with 3 or more entries.
So the resulting df looks like
STOCK YR MONTH DAY PRICE
AAA 2022 1 1 10
AAA 2022 1 2 11
AAA 2022 1 3 10
AAA 2022 1 4 15
AAA 2022 1 5 10
Note that BBB is eliminated as it had only 2 rows in each group, when grouped by STOCK, YR and MONTH
I have tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x.STOCK.nunique() > 5) but this resulted in an empty frame.
Also tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x['STOCK','YR','MONTH'].nunique() > 5) but this resulted in a KeyError: ('STOCK', 'YR', 'MONTH')
Thanks!
Use GroupBy.transform('count'):
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('count').ge(3)]
or 'size':
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').ge(3)]
output:
STOCK YR MONTH DAY PRICE
0 AAA 2022 1 1 10
1 AAA 2022 1 2 11
2 AAA 2022 1 3 10
3 AAA 2022 1 4 15
4 AAA 2022 1 5 10
Use GroupBy.transform:
If need counts (not exclude possible NaNs):
#if need test number of unique values
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').gt(3)]
Or:
#in large df should be slow
df.groupby(['STOCK','YR','MONTH']).filter(lambda x: len(x) > 3)

running total starting from a date column

I'm trying to get a running total as of a date. This is the data I have
Date
transaction Amount
End of Week Balance
jan 1
5
100
jan 2
3
100
jan 3
4
100
jan 4
3
100
jan 5
1
100
jan 6
3
100
I would like to find out what the daily end balance is. My thought is to get a running total from each day to the end of the week and subtract it from the end of week balance, like below
Date
transaction Amount
Running total
End of Week Balance
Balance - Running total
jan 1
5
19
100
86
jan 2
3
14
100
89
jan 3
4
11
100
93
jan 4
3
7
100
96
jan 5
1
4
100
97
jan 6
3
3
100
100
I can use
SUM(transactionAmount) OVER (Order by Date)
to get a running total, is there a way to specify that I only want the total of transactions that have taken place after the date?
You can use sum() as a window function, but accumulate in reverse:
select t.*,
(end_of_week_balance -
sum(transactionAmount) over (order by date desc)
)
from t;
If you have this example:
1> select i, sum(i) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
you can also do:
1> select i, sum(case when i>3 then i else 0 end) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 0
2 0
3 0
4 4
5 9
6 15
7 22
8 30
9 39

Year wise aggregation on the given condition in pandas

I have a data frame as shown below. which is a sales data of two health care product starting from December 2016 to November 2018.
product price sale_date discount
A 50 2016-12-01 5
A 50 2017-01-03 4
B 200 2016-12-24 10
A 50 2017-01-18 3
B 200 2017-01-28 15
A 50 2017-01-18 6
B 200 2017-01-28 20
A 50 2017-04-18 6
B 200 2017-12-08 25
A 50 2017-11-18 6
B 200 2017-08-21 20
B 200 2017-12-28 30
A 50 2018-03-18 10
B 300 2018-06-08 45
B 300 2018-09-20 50
A 50 2018-11-18 8
B 300 2018-11-28 35
From the above I would like to prepare below data frame
Expected Output:
product year number_of_months total_price total_discount number_of_sales
A 2016 1 50 5 1
B 2016 1 200 10 1
A 2017 12 250 25 5
B 2017 12 1000 110 5
A 2018 11 100 18 2
B 2018 11 900 130 3
Note: Please note that the data starts from Dec 2016 to Nov 2018.
So number of months in 2016 is 1, in 2017 we have full data so 12 months and 2018 we have 11 months.
First aggregate sum by years and product and then create new column for counts by months by DataFrame.insert and Series.map:
df1 =(df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
If want dynamic dictionary by minumal and maximal datetimes use:
s = pd.date_range(df['sale_date'].min(), df['sale_date'].max(), freq='MS')
d = s.year.value_counts().to_dict()
print (d)
{2017: 12, 2018: 11, 2016: 1}
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map(d))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
For ploting is used DataFrame.set_index with DataFrame.unstack:
df2 = (df1.set_index(['sale_date','product'])[['total_price','total_discount']]
.unstack(fill_value=0))
df2.columns = df2.columns.map('_'.join)
print (df2)
total_price_A total_price_B total_discount_A total_discount_B
sale_date
2016 50 200 5 10
2017 250 1000 25 110
2018 100 900 18 130
df2.plot()
EDIT:
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False)
.agg( total_price=('price','sum'),
total_discount=('discount','sum'),
number_of_sales=('discount','size'))
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount \
0 A 2016 NaN 50 5
1 A 2017 NaN 250 25
2 B 2016 NaN 200 10
3 B 2017 NaN 1000 110
4 A 2018 NaN 100 18
5 B 2018 NaN 900 130
number_of_sales
0 1
1 5
2 1
3 5
4 2
5 3

Joining From Differently Formatted Tables

Table 1:
ID Year Month
-----------------
1 2018 1
2 2018 1
3 2018 1
1 2018 2
2 2018 2
3 2018 2
Table 2:
ID Year Jan Feb Mar
------------------------
1 2018 100 200 300
2 2018 200 400 300
3 2018 200 500 700
How can I join these two tables even though they are laid out differently?
I was exploring a case join but that doesn't seem to be exactly what I need.
I'd like my output to be:
ID Year Month Data
1 2018 1 100
2 2018 1 200
3 2018 1 200
1 2018 2 200
2 2018 2 400
3 2018 2 500
1 2018 3 300
2 2018 3 300
3 2018 3 700
So, firstly we get TableB in the right format:
SELECT B.ID, B.Year, B.MonthValue
INTO TableB_New
FROM TableB T
UNPIVOT
(
MonthValue FOR Month IN (Jan, Feb, Mar)
) AS B
And then you do the join. Good Luck!