I have a Row of values in a dataframe and want to calculate the rolling average (3 Period) by creating a new row.
existing_row 1 2 3 4 5 6 7 8 9
create_new_row 2 3 4 5 6 7 8
Use DataFrame.rolling with axis=1 and mean:
print (df)
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9
df1 = df.rolling(3, axis=1).mean()
print (df1)
0 1 2 3 4 5 6 7 8
0 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0
If need join to original pass to concat:
df = pd.concat([df, df1], ignore_index=True)
print (df)
0 1 2 3 4 5 6 7 8
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Use rolling_mean:
out = df.append(df.rolling(3, axis=1).mean(), ignore_index=True)
print(out)
# Output
A B C D E F G H I
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Setup:
df = pd.DataFrame({'A': {0: 1}, 'B': {0: 2}, 'C': {0: 3}, 'D': {0: 4}, 'E': {0: 5},
'F': {0: 6}, 'G': {0: 7}, 'H': {0: 8}, 'I': {0: 9}})
print(df)
# Output
A B C D E F G H I
0 1 2 3 4 5 6 7 8 9
Related
This question already has answers here:
Make Multiple Shifted (Lagged) Columns in Pandas
(4 answers)
Closed 3 months ago.
import pandas as pd
df = pd.DataFrame(np.array([[1, 0, 0], [4, 5, 0], [7, 7, 7], [7, 4, 5], [4, 5, 0], [7, 8, 9], [3, 2, 9], [9, 3, 6], [6, 8, 5]]),
columns=['a', 'b', 'c'],
index = ['1/1/2000', '1/1/2001', '1/1/2002', '1/1/2003', '1/1/2004', '1/1/2005', '1/1/2006', '1/1/2007', '1/1/2008'])
df['a_1'] = df['a'].shift(1)
df['a_3'] = df['a'].shift(3)
df['a_5'] = df['a'].shift(5)
df['a_7'] = df['a'].shift(7)
Above is a dummy example of how I am shifting.
Issues: 1. Need extra line for different period of shift, can it be done in one go?
2. Above df is small, in case of massive dataframe this operation is slow. I checked different questions: most are relating it to shift not being cython optimized, is there a faster way (apart from numba which few answer do talk about)
nums = [1, 3, 5, 7]
pd.concat([df] + [df['a'].shift(i).to_frame(f'a_{i}') for i in nums], axis=1)
result:
a b c a_1 a_3 a_5 a_7
1/1/2000 1 0 0 NaN NaN NaN NaN
1/1/2001 4 5 0 1.0 NaN NaN NaN
1/1/2002 7 7 7 4.0 NaN NaN NaN
1/1/2003 7 4 5 7.0 1.0 NaN NaN
1/1/2004 4 5 0 7.0 4.0 NaN NaN
1/1/2005 7 8 9 4.0 7.0 1.0 NaN
1/1/2006 3 2 9 7.0 7.0 4.0 NaN
1/1/2007 9 3 6 3.0 4.0 7.0 1.0
1/1/2008 6 8 5 9.0 7.0 7.0 4.0
I have the following DataFrame:
A X
Time
1 a 10
2 b 17
3 b 20
4 c 21
5 c 36
6 d 40
given by pd.DataFrame({'Time': [1, 2, 3, 4, 5, 6], 'A': ['a', 'b', 'b', 'c', 'c', 'd'], 'X': [10, 17, 20, 21, 36, 40]}).set_index('Time')
The desired output is:
Time Difference
0 2 7
1 4 1
2 6 4
The first difference 1 is a result of subtracting 21 from 20: (first "c" value - last "b" value).
I'm open to numPy transformations as well.
Aggregate by GroupBy.agg with GroupBy.first,
GroupBy.last and then subtract shifted values for last column with omit first row by positions:
df = df.reset_index()
df1 = df.groupby('A',as_index=False, sort=False).agg(first=('X', 'first'),
last=('X','last'),
Time=('Time','first'))
df1['Difference'] = df1['first'].sub(df1['last'].shift(fill_value=0))
df1 = df1[['Time','Difference']].iloc[1:].reset_index(drop=True)
print (df1)
Time Difference
0 2 7
1 4 1
2 6 4
IIUC, you can pivot, ffill the columns, and compute the difference:
g = df.reset_index().groupby('A')
(df.assign(col=g.cumcount().values)
.pivot('A', 'col', 'X')
.ffill(axis=1)
.assign(Time=g['Time'].first(),
diff=lambda d: d[0]-d[1].shift())
[['Time', 'diff']].iloc[1:]
.rename_axis(index=None, columns=None)
)
output:
Time Difference
b 2 7.0
c 4 1.0
d 6 4.0
Intermediate, pivoted/ffilled dataframe:
col 0 1 Time Difference
A
a 10.0 10.0 1 NaN
b 17.0 20.0 2 7.0
c 21.0 36.0 4 1.0
d 40.0 40.0 6 4.0
Another possible solution:
(df.assign(Y = df['X'].shift())
.iloc[df.index % 2 == 0]
.assign(Difference = lambda z: z['X'] - z['Y'])
.reset_index()
.loc[:, ['Time', 'Difference']]
)
Output:
Time Difference
0 2 7.0
1 4 1.0
2 6 4.0
I have a data frame as given below
df = pd.DataFrame({'key': ['a', 'a', 'a', 'b', 'c', 'c'] , 'val' : [10, np.nan, 9 , 10, 11, 13]})
df
key val
0 a 10.0
1 a NaN
2 a 9.0
3 b 10.0
4 c 11.0
5 c 13.0
I want to perform groupby and transform that new column is each value divided by group mean , which I can do as below
df['new'] = df.groupby('key')['val'].transform(lambda g : g/g.mean())
df.new
0 1.052632
1 NaN
2 0.947368
3 1.000000
4 0.916667
5 1.083333
Name: new, dtype: float64
Now I have condition that if val is np.nan then new column value will be np.inf which should result as below
0 1.052632
1 np.inf
2 0.947368
3 1.000000
4 0.916667
5 1.083333
Name: new, dtype: float64
In other words how can I have this check if a val is np.nan with groupby and transform.
Thanks in advance
Add Series.replace:
df['new'] = (df.groupby('key')['val'].transform(lambda g : g/g.mean())
.replace(np.nan, np.inf))
print (df)
key val new
0 a 10.0 1.052632
1 a NaN inf
2 a 9.0 0.947368
3 b 10.0 1.000000
4 c 11.0 0.916667
5 c 13.0 1.083333
Or numpy.where:
df['new'] = np.where(df.val.isna(),
np.inf, df.groupby('key')['val'].transform(lambda g : g/g.mean()))
print (df)
key val new
0 a 10.0 1.052632
1 a NaN inf
2 a 9.0 0.947368
3 b 10.0 1.000000
4 c 11.0 0.916667
5 c 13.0 1.083333
If there are three columns of data, the first column is some category id, the second column and the third column have some missing values, I want to aggregate the id of the first column, after grouping, fill in the third column of each group by the method: 'ffill' Missing value
I found a good idea here: Pandas: filling missing values by weighted average in each group! , but it didn't solve my problem because the output it produced was not what I wanted
Enter the following code to get an example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['A','A', 'B','B','B','B', 'C','C','C'],'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3],
'sss':[1, np.nan, 3, np.nan, np.nan, np.nan, 2, np.nan, np.nan]})
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN NaN
2 B NaN 3.0
3 B 2.0 NaN
4 B 3.0 NaN
5 B 1.0 NaN
6 C 3.0 2.0
7 C NaN NaN
8 C 3.0 NaN
Fill in missing values with a previous value after grouping
Then I ran the following code, but it outputs strange results:
df["sss"] = df.groupby("name").transform(lambda x: x.fillna(axis = 0,method = 'ffill'))
df
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN 1.0
2 B NaN NaN
3 B 2.0 2.0
4 B 3.0 3.0
5 B 1.0 1.0
6 C 3.0 3.0
7 C NaN 3.0
8 C 3.0 3.0
The result I want is this:
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN 1.0
2 B NaN 3.0
3 B 2.0 3.0
4 B 3.0 3.0
5 B 1.0 3.0
6 C 3.0 2.0
7 C NaN 2.0
8 C 3.0 2.0
Can someone point out where I am wrong?strong text
I am trying to add a new column to pandas dataframe after groupby and rolling average but the newly generated column changes order after reset_index()
original dataframe
Name Values
0 A 1
1 A 2
2 A 3
3 B 1
4 B 2
5 C 3
6 A 2
7 A 6
8 B 8
9 B 3
10 D 0
after groupby and rolling it looks something like:
Name
A 0 NaN
1 NaN
2 2.000000
6 2.333333
7 3.666667
B 3 NaN
4 NaN
8 3.666667
9 4.333333
C 5 NaN
D 10 NaN
Name: Values, dtype: float64
Now can someone help me to add this result in new column in the original dataframe? Because when I try to reset_index(), the order changes to the groupby order.
Use apply to apply rolling mean on each group,
df['rolling_mean'] = df.groupby('Name').Values.apply(lambda x: x.rolling(3).mean())
df
Name Values rolling_mean
0 A 1 NaN
1 A 2 NaN
2 A 3 2.000000
3 B 1 NaN
4 B 2 NaN
5 C 3 NaN
6 A 2 2.333333
7 A 6 3.666667
8 B 8 3.666667
9 B 3 4.333333
10 D 0 NaN
Here is an example:
df = pd.DataFrame({'Name': {0: 'A',
1: 'A',
2: 'A',
3: 'B',
4: 'B',
5: 'C',
6: 'A',
7: 'A',
8: 'B',
9: 'B',
10: 'D'},
'Values': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 2, 7: 6, 8: 8, 9: 3, 10: 0}})
df2 = pd.DataFrame({2: {('A', 0): np.nan,
('A', 1): np.nan,
('A', 2): 2.0,
('A', 6): 2.333333,
('A', 7): 3.666667,
('B', 3): np.nan,
('B', 4): np.nan,
('B', 8): 3.666667,
('B', 9): 4.3333330000000005,
('C', 5): np.nan,
('D', 10): np.nan}})
df.merge(df2.reset_index(level=0), left_index=True, right_index=True)
Name Values 0 2
0 A 1 A NaN
1 A 2 A NaN
2 A 3 A 2.000000
3 B 1 B NaN
4 B 2 B NaN
5 C 3 C NaN
6 A 2 A 2.333333
7 A 6 A 3.666667
8 B 8 B 3.666667
9 B 3 B 4.333333
10 D 0 D NaN
or join:
df.join(df2.reset_index(level=0))
Name Values 0 2
0 A 1 A NaN
1 A 2 A NaN
2 A 3 A 2.000000
3 B 1 B NaN
4 B 2 B NaN
5 C 3 C NaN
6 A 2 A 2.333333
7 A 6 A 3.666667
8 B 8 B 3.666667
9 B 3 B 4.333333
10 D 0 D NaN