-Adding a column to a pandas dataframe that is sum of 3 different rows in another column AND slides those rows down like in excel - pandas

I have a data frame like:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
5 16 17 18
6 19 20 21
7 22 23 24
8 25 26 27
I'd like to add a column d that is the sum of column A row 0, column A row 2, and column A row 5.
I figured out how to do:
df['d']=df.loc[0,'a'] + df.loc[2,'a'] + df.loc[5,'a']
But the result is a static d tied to only those rows. I'd like a dynamic d, such that column d, row 2 is the sum of column a, row 1, column a, row3, and column a, row 6.
The end result should be:
a b c d
0 1 2 3 24
1 4 5 6 33
2 7 8 9 42
3 10 11 12 ---And so on
4 13 14 15 ---
5 16 17 18 ---
6 19 20 21 ---
7 22 23 24 ---
8 25 26 27 ---
Thanks for any help!

this is shift
df.a+df.a.shift(-2)+df.a.shift(-5)
Out[412]:
0 24.0
1 33.0
2 42.0
3 51.0
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
Name: a, dtype: float64
df['d']=df.a+df.a.shift(-2)+df.a.shift(-5)
df
Out[414]:
a b c d
0 1 2 3 24.0
1 4 5 6 33.0
2 7 8 9 42.0
3 10 11 12 51.0
4 13 14 15 NaN
5 16 17 18 NaN
6 19 20 21 NaN
7 22 23 24 NaN
8 25 26 27 NaN

Related

Keep only the first value on duplicated column (set 0 to others)

Supposing I have the following situation:
A dataframe where the first column ['ID'] will eventually have duplicated values.
import pandas as pd
df = pd.DataFrame({"ID": [1,2,3,4,4,5,5,5,6,6],
"l_1": [10,12,32,45,45,20,20,20,20,20],
"l_2": [11,12,32,11,21,27,38,12,9,6],
"l_3": [5,9,32,12,21,21,18,12,8,1],
"l_4": [6,21,12,77,77,2,2,2,8,8]})
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 45 21 21 77
5 20 27 21 2
5 20 38 18 2
5 20 12 12 2
6 20 9 8 8
6 20 6 1 8
When duplicated IDs occurs:
I need to keep only the first values for column l_1 and l_4 (other duplicated rows must be zero).
Columns 'l_2' and 'l_3' must stay the same.
When duplicated IDs occurs, the values on these rows on columns l_1 and l_4 will be also duplicated.
Expected output:
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 0 21 21 0
5 20 27 21 2
5 0 38 18 0
5 0 12 12 0
6 20 9 8 8
6 0 6 1 0
Is there a Straightforward way using pandas or numpy to accomplish this ?
I could just accomplish it doing all these steps:
x1 = df[df.duplicated(subset=['ID'], keep=False)].copy()
x1.loc[x1.groupby('ID')['l_1'].apply(lambda x: (x.shift(1) == x)), 'l_1'] = 0
x1.loc[x1.groupby('ID')['l_4'].apply(lambda x: (x.shift(1) == x)), 'l_4'] = 0
df = df.drop_duplicates(subset=['ID'], keep=False)
df = pd.concat([df, x1])
Isn't this just:
df.loc[df.duplicated('ID'), ['l_1','l_4']] = 0
Output:
ID l_1 l_2 l_3 l_4
0 1 10 11 5 6
1 2 12 12 9 21
2 3 32 32 32 12
3 4 45 11 12 77
4 4 0 21 21 0
5 5 20 27 21 2
6 5 0 38 18 0
7 5 0 12 12 0
8 6 20 9 8 8
9 6 0 6 1 0

Difference between previous row and next row in pandas gives NaN for first value

I am new to using Pandas and I have a dataframe df as given below
A B
0 4 5
1 5 8
2 6 11
3 7 13
4 8 15
5 9 30
6 10 477
7 11 3643
8 12 33469
9 13 141409
10 14 335338
11 15 365115
I want to get the difference between previous row and next row for B column
I used df.set_index('B').diff() but it gives NaN for first row. How to get 5 there?
A B
4 NaN
5 3.0
6 3.0
7 2.0
8 2.0
9 15.0
10 447.0
11 3166.0
12 29826.0
13 107940.0
14 193929.0
15 29777.0
Let us do
df.B.diff().fillna(df.B)
0 5.0
1 3.0
2 3.0
3 2.0
4 2.0
5 15.0
6 447.0
7 3166.0
8 29826.0
9 107940.0
10 193929.0
11 29777.0
Name: B, dtype: float64

Sum of group but keep the same value for each row in pandas

How to solve same problem in this link Sum of group but keep the same value for each row in r using pandas?
I can generate separate df have the sum for each group and then merge the generated df with the original.
You can use groupby & transform as below to get your output.
df['sumx']=df.groupby(['ID', 'Group'],sort=False)['x'].transform(sum)
df['sumy']=df.groupby(['ID', 'Group'],sort=False)['y'].transform(sum)
df
output
ID Group x y sumx sumy
1 1 1 1 12 3 25
2 1 1 2 13 3 25
3 1 2 3 14 3 14
4 3 1 4 15 15 48
5 3 1 5 16 15 48
6 3 1 6 17 15 48
7 3 2 7 18 15 37
8 3 2 8 19 15 37
9 4 1 9 20 30 63
10 4 1 10 21 30 63
11 4 1 11 22 30 63
12 4 2 12 23 12 23

Column values of multilevel indexed DataFrame are not properly updated

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(30).reshape(6,5), index=[list('aaabbb'), list('XYZXYZ')])
print(df)
df.loc[pd.IndexSlice['a'], 3] /= 10
print(df)
From the above code I expected below table:
0 1 2 3 4
a X 0 1 2 0.3 4
Y 5 6 7 0.8 9
Z 10 11 12 0.13 14
b X 15 16 17 18 19
Y 20 21 22 23 24
Z 25 26 27 28 29
But the actual result is as below table:
0 1 2 3 4
a X 0 1 2 NaN 4
Y 5 6 7 NaN 9
Z 10 11 12 NaN 14
b X 15 16 17 18.0 19
Y 20 21 22 23.0 24
Z 25 26 27 28.0 29
What went wrong in the code?
Need specify second level by : for select all values:
df.loc[pd.IndexSlice['a', :], 3] /= 10
print(df)
0 1 2 3 4
a X 0 1 2 0.3 4
Y 5 6 7 0.8 9
Z 10 11 12 1.3 14
b X 15 16 17 18.0 19
Y 20 21 22 23.0 24
Z 25 26 27 28.0 29
Solution with slice:
df.loc[(slice('a'), slice(None)), 3] /= 10
print(df)
0 1 2 3 4
a X 0 1 2 0.3 4
Y 5 6 7 0.8 9
Z 10 11 12 1.3 14
b X 15 16 17 18.0 19
Y 20 21 22 23.0 24
Z 25 26 27 28.0 29

pandas Two DataFrames with multipindex, add together?

We Know that dataframe can add anther dataframe by index.
While i has two dataframes with multipindex, so how can i add them by level 'a' in index.
data3.head()
c d e
a b
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
data4.head()
b d e
a c
0 2 1 3 4
5 7 6 8 9
10 12 11 13 14
15 17 16 18 19
20 22 21 23 24
25 27 26 28 29
data3 + data4
error:merging with both multi-indexes is not implemented