Cumulative Deviation of 2 Columns in Pandas DF - pandas

I have a rather simple request and have not found a suitable solution online. I have a DF that looks like this below and I need to find the cumulative deviation as shown in a new column to the DF. My DF looks like this:
year month Curr Yr LT Avg
0 2022 1 667590.5985 594474.2003
1 2022 2 701655.5967 585753.1173
2 2022 3 667260.5368 575550.6112
3 2022 4 795338.8914 562312.5309
4 2022 5 516510.1103 501330.4306
5 2022 6 465717.9192 418087.1358
6 2022 7 366100.4456 344854.2453
7 2022 8 355089.157 351539.9371
8 2022 9 468479.4396 496831.2979
9 2022 10 569234.4156 570767.1723
10 2022 11 719505.8569 594368.6991
11 2022 12 670304.78 576495.7539
And, I need the cumulative deviation new column in this DF to look like this:
Cum Dev
0.122993392
0.160154637
0.159888559
0.221628609
0.187604073
0.178089327
0.16687643
0.152866293
0.129326033
0.114260993
0.124487107
0.128058305
In Excel, the calculation would look like this with data in Excel columns Z3:Z14, AA3:AA14 for the first row: =SUM(Z$3:Z3)/SUM(AA$3:AA3)-1 and for the next row: =SUM(Z$3:Z4)/SUM(AA$3:AA4)-1 and for the next as follows with the last row looking like this in the Excel example: =SUM(Z$3:Z14)/SUM(AA$3:AA14)-1
Thank you kindly for your help,

You can divide the cumulative sums of those 2 columns element-wise, and then subtract 1 at the end:
>>> (df["Curr Yr"].cumsum() / df["LT Avg"].cumsum()) - 1
0 0.122993
1 0.160155
2 0.159889
3 0.221629
4 0.187604
5 0.178089
6 0.166876
7 0.152866
8 0.129326
9 0.114261
10 0.124487
11 0.128058
dtype: float64

Related

Rolling Rows in pandas.DataFrame

I have a dataframe that looks like this:
year
month
valueCounts
2019
1
73.411285
2019
2
53.589128
2019
3
71.103842
2019
4
79.528084
I want valueCounts column's values to be rolled like:
year
month
valueCounts
2019
1
53.589128
2019
2
71.103842
2019
3
79.528084
2019
4
NaN
I can do this by dropping first index of dataframe and assigning last index to NaN but it doesn't look efficient. Is there any simpler method to do this?
Thanks.
Assuming your dataframe are already sorted.
Use shift:
df['valueCounts'] = df['valueCounts'].shift(-1)
print(df)
# Output
year month valueCounts
0 2019 1 53.589128
1 2019 2 71.103842
2 2019 3 79.528084
3 2019 4 NaN

pandas reset_index of certain level removes entire level of multiindex

I have DataFrame like this:
performance
year month week
2015 1 2 4.170358
3 3.423766
4 -1.835888
5 8.157457
2 6 -3.276887
... ...
2018 7 30 -1.045241
31 -0.870845
8 31 0.950555
32 6.757876
33 -2.203334
I want to have week in range(0 or 1,n) where n = number of weeks in current year and month.
Well, the easy way I thought is to use
df.reset_index(level=2, drop=True)
But it's mistake I realized later, in best scenario I would get
performance
year month week
2015 1 0 4.170358
1 3.423766
2 -1.835888
3 8.157457
2 4 -3.276887
... ...
2018 7 n-4 -1.045241
n-3 -0.870845
8 n-2 0.950555
n-1 6.757876
n -2.203334
But after I did that, I got an unexpected behaviour
close
timestamp timestamp
2015 1 4.170358
1 3.423766
1 -1.835888
1 8.157457
2 -3.276887
... ...
2018 7 -1.045241
7 -0.870845
8 0.950555
8 6.757876
8 -2.203334
I lost entire 2nd level of index! Why? I thought it will be 0 to n for each 'cluster' (Ye, it's mistake, I realized it as I mentioned above)...
I solved my problem somesthing like that
df.groupby(level = [0, 1]).apply(lambda x: x.reset_index(drop=True))
And got my desired form of DataFrame like that:
performance
year month
2015 1 0 4.170358
1 3.423766
2 -1.835888
3 8.157457
2 0 -3.276887
... ...
2018 7 3 -1.045241
4 -0.870845
8 0 0.950555
1 6.757876
2 -2.203334
But WHY? Why reset_index on certain level just drops it? That's the main quastion!
reset_index with drop=True adds a default index only when you are reseting the whole index. If you're reseting just a single level of a multi-level index, it will just remove it.

How to join columns in Julia?

I have opened a dataframe in julia where i have 3 columns like this:
day month year
1 1 2011
2 4 2015
3 12 2018
how can I make a new column called date that goes:
day month year date
1 1 2011 1/1/2011
2 4 2015 2/4/2015
3 12 2018 3/12/2018
I was trying with this:
df[!,:date]= df.day.*"/".*df.month.*"/".*df.year
but it didn't work.
in R i would do:
df$date=paste(df$day, df$month, df$year, sep="/")
is there anything similar?
thanks in advance!
Julia has an inbuilt Date type in its standard library:
julia> using Dates
julia> df[!, :date] = Date.(df.year, df.month, df.day)
3-element Vector{Date}:
2011-01-01
2015-04-02
2018-12-03

Reshape Dataframe with Column Information as New Single Column [duplicate]

This question already has answers here:
Wide to long data transform in pandas
(3 answers)
Closed 1 year ago.
I need to reshape a df and use the "year" information as a new column after reshaping. My data looks like this for df and will potentially contain more year data and players:
index player A 2012 player B 2012 player A 2013 player B 2013
0 15 10 20 35
1 40 25 60 70
My final df needs to look like this for dfnew:
index year player A player B
0 2012 15 10
0 2013 20 35
1 2012 40 25
1 2013 60 70
I"ve tried multiple variations of this code below and don't have a lot of experience in this but I don't know how to account for the changing "year" - i.e., 2012, 2013 and then to make that changing year into a new column.
df.pivot(index="index", columns=['player A','player B'])
Thank you very much,
Use wide_to_long:
df = pd.wide_to_long(df.reset_index(),
stubnames=['player A','player B'],
i='index',
j='Year',
sep=' ').reset_index(level=1).sort_index()
print (df)
Year player A player B
index
0 2012 15 10
0 2013 20 35
1 2012 40 25
1 2013 60 70
Or Series.str.rsplit by last space with DataFrame.stack:
df.columns = df.columns.str.rsplit(n=1, expand=True)
df = df.stack().rename_axis((None, 'Year')).reset_index(level=1)
print (df)
Year player A player B
0 2012 15 10
0 2013 20 35
1 2012 40 25
1 2013 60 70

flatten a multi index in pandas [duplicate]

This question already has answers here:
Pandas index column title or name
(9 answers)
Closed 2 years ago.
I need to set index to my rows, and when I do that, pandas automatically makes my column index hierarchical..and then I tried every flatten mathod I can search, but once I reset_index, my index for row are replaced with iloc (integers). If I use df.columns = [ my col index name], it doesn't flatten my columns' index at all..
I use pandas official docs as example
df = pd.DataFrame({'month': [1, 4, 7, 10],
'year': [2012, 2014, 2013, 2014],
'sale': [55, 40, 84, 31]})
df.set_index('month')
and I get
year sale
month
1 2012 55
4 2014 40
7 2013 84
10 2014 31
Then I flatten the index by
df.reset_index()
Then it becomes
index month year sale
0 0 1 2012 55
1 1 4 2014 40
2 2 7 2013 84
3 3 10 2014 31
(The month for row index disappeard...)
This really kills me so Im appreciate it if someone can help to make the dataframe to sth like
month year sale
1 2012 55
4 2014 40
7 2013 84
10 2014 31
Thanks!
You only need to
df.reset_index(drop=True)
which returns
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31