sort column by absolute value with pandas - pandas

I am trying to sort this dataframe, on abs(C)
A B C
0 10.3 11.3 -0.72
1 16.2 10.9 -0.84
2 18.1 15.2 0.64
3 12.2 11.3 0.31
4 17.2 12.2 -0.75
5 11.6 15.4 -0.08
6 16.0 10.4 0.05
7 18.8 14.7 -0.61
8 12.6 16.3 0.85
9 11.6 10.8 0.93
To do that, I have to append a new column D = abs(C), and then sort on D
df['D']= abs (df['C'])
df.sort_values(by=['D'])
Is there a way to do the job in one method?

Use Series.argsort for position of absolute values by Series.abs and then change order of rows by DataFrame.iloc:
df2 = df.iloc[df.C.abs().argsort()]
print (df2)
A B C
6 16.0 10.4 0.05
5 11.6 15.4 -0.08
3 12.2 11.3 0.31
7 18.8 14.7 -0.61
2 18.1 15.2 0.64
0 10.3 11.3 -0.72
4 17.2 12.2 -0.75
1 16.2 10.9 -0.84
8 12.6 16.3 0.85
9 11.6 10.8 0.93

(From my answer in another post:)
Perfect Simple Solution with the Pandas > V_1.1.0:
Use the parameter key in the sort_values function:
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd', 'e', 'f'], 'b': [-3, -2, -1, 0, 1, 2]})
df.sort_values(by='b', key=abs)
will yield:
a b
3 d 0
2 c -1
4 e 1
1 b -2
5 f 2
0 a -3

import pandas as pd
ttt = pd.DataFrame({'a': ['a', 'b', 'c', 'd', 'e', 'f'], 'b': [-3, -2, -1, 0, 1, 2]})
# ascending order
ttt_as = ttt.iloc[ttt.b.abs().argsort()]
print (ttt_as)
# descending order
ttt_des = ttt.iloc[ttt.b.abs().argsort()][::-1]
print (ttt_des)

Related

groupby transform with if condition in pandas

I have a data frame as given below
df = pd.DataFrame({'key': ['a', 'a', 'a', 'b', 'c', 'c'] , 'val' : [10, np.nan, 9 , 10, 11, 13]})
df
key val
0 a 10.0
1 a NaN
2 a 9.0
3 b 10.0
4 c 11.0
5 c 13.0
I want to perform groupby and transform that new column is each value divided by group mean , which I can do as below
df['new'] = df.groupby('key')['val'].transform(lambda g : g/g.mean())
df.new
0 1.052632
1 NaN
2 0.947368
3 1.000000
4 0.916667
5 1.083333
Name: new, dtype: float64
Now I have condition that if val is np.nan then new column value will be np.inf which should result as below
0 1.052632
1 np.inf
2 0.947368
3 1.000000
4 0.916667
5 1.083333
Name: new, dtype: float64
In other words how can I have this check if a val is np.nan with groupby and transform.
Thanks in advance
Add Series.replace:
df['new'] = (df.groupby('key')['val'].transform(lambda g : g/g.mean())
.replace(np.nan, np.inf))
print (df)
key val new
0 a 10.0 1.052632
1 a NaN inf
2 a 9.0 0.947368
3 b 10.0 1.000000
4 c 11.0 0.916667
5 c 13.0 1.083333
Or numpy.where:
df['new'] = np.where(df.val.isna(),
np.inf, df.groupby('key')['val'].transform(lambda g : g/g.mean()))
print (df)
key val new
0 a 10.0 1.052632
1 a NaN inf
2 a 9.0 0.947368
3 b 10.0 1.000000
4 c 11.0 0.916667
5 c 13.0 1.083333

Pandas concatenate dataframe with multiindex retaining index names

I have a list of DataFrames as follows where each DataFrame in the list is as follows:
dfList[0]
monthNum 1 2
G1
2.0 0.05 -0.16
3.0 1.17 0.07
4.0 9.06 0.83
dfList[1]
monthNum 1 2
G2
21.0 0.25 0.26
31.0 1.27 0.27
41.0 9.26 0.23
dfList[0].index
Float64Index([2.0, 3.0, 4.0], dtype='float64', name='G1')
dfList[0].columns
Int64Index([1, 2], dtype='int64', name='monthNum')
I am trying to achieve the following in a dataframe Final_Combined_DF:
monthNum 1 2
G1
2.0 0.05 -0.16
3.0 1.17 0.07
4.0 9.06 0.83
G2
21.0 0.25 0.26
31.0 1.27 0.27
41.0 9.26 0.23
I tried doing different combinations of:
pd.concat(dfList, axis=0)
but it has not given me desired output. I am not sure how to go about this.
We can try pd.concat with keys using the Index.name from each DataFrame to add a new level index in the final frame:
final_combined_df = pd.concat(
df_list, keys=map(lambda d: d.index.name, df_list)
)
final_combined_df:
monthNum 0 1
G1 2.0 4 7
3.0 7 1
4.0 9 5
G2 21.0 8 1
31.0 1 8
41.0 2 6
Setup Used:
import numpy as np
import pandas as pd
np.random.seed(5)
df_list = [
pd.DataFrame(np.random.randint(1, 10, (3, 2)),
columns=pd.Index([0, 1], name='monthNum'),
index=pd.Index([2.0, 3.0, 4.0], name='G1')),
pd.DataFrame(np.random.randint(1, 10, (3, 2)),
columns=pd.Index([0, 1], name='monthNum'),
index=pd.Index([21.0, 31.0, 41.0], name='G2'))
]
df_list:
[monthNum 0 1
G1
2.0 4 7
3.0 7 1
4.0 9 5,
monthNum 0 1
G2
21.0 8 1
31.0 1 8
41.0 2 6]

Python Pandas Where Condition Is Not Working

I have created a where condition with python.
filter = data['Ueber'] > 2.3
data[filter]
Here you can see my dataset.
Saison Spieltag Heimteam ... Ueber Unter UeberUnter
0 1819 3 Bayern München ... 1.30 3.48 Ueber
1 1819 3 Werder Bremen ... 1.75 2.12 Unter
2 1819 3 SC Freiburg ... 2.20 1.69 Ueber
3 1819 3 VfL Wolfsburg ... 2.17 1.71 Ueber
4 1819 3 Fortuna Düsseldorf ... 1.46 2.71 Ueber
Unfortunately, my greater than condition is not working. What's the problem?
Thanks
Just for the sake of clarity, if you have really floats into your column, which you want into conditional check then it should work.
Example DataFrame:
>>> df = pd.DataFrame({'num': [-12.5, 60.0, 50.0, -25.10, 50.0, 51.0, 71.0]} , dtype=float)
>>> df
num
0 -12.5
1 60.0
2 50.0
3 -25.1
4 50.0
5 51.0
6 71.0
Conditional check to compare..
>>> df['num'] > 50.0
0 False
1 True
2 False
3 False
4 False
5 True
6 True
Name: num, dtype: bool
Result:
>>> df [ df['num'] > 50.0 ]
num
1 60.0
5 51.0
6 71.0

Take product of columns in dataframe with lags

Have following dataframe.
A = pd.Series([2, 3, 4, 5], index=[1, 2, 3, 4])
B = pd.Series([6, 7, 8, 9], index=[1, 2, 3, 4])
Aw = pd.Series([0.25, 0.3, 0.33, 0.36], index=[1, 2, 3, 4])
Bw = pd.Series([0.75, 0.7, 0.67, 0.65], index=[1, 2, 3, 4])
df = pd.DataFrame({'A': A, 'B': B, 'Aw': Aw, 'Bw', Bw})
df
Index A B Aw Bw
1 2 6 0.25 0.75
2 3 7 0.30 0.70
3 4 8 0.33 0.67
4 5 9 0.36 0.64
What I would like to do is multiply 'A' and lag of 'Aw' and likewise 'B' with 'Bw'. The resulting dataframe will look like the following:
Index A B Aw Bw A_ctr B_ctr
1 2 6 NaN NaN NaN NaN
2 3 7 0.25 0.75 0.75 5.25
3 4 8 0.3 0.7 1.2 5.6
4 5 9 0.33 0.64 1.65 5.76
Thank you in advance
To get your desired output, first shift Aw and Bw, then multiply them by A and B:
df[['Aw','Bw']] = df[['Aw','Bw']].shift()
df[['A_ctr','B_ctr']] = df[['A','B']].values*df[['Aw','Bw']]
A B Aw Bw A_ctr B_ctr
1 2 6 NaN NaN NaN NaN
2 3 7 0.25 0.75 0.75 5.25
3 4 8 0.30 0.70 1.20 5.60
4 5 9 0.33 0.67 1.65 6.03

how to create a time series column and reindex in pandas?

How to create a column and reindex in pandas?
I'm a new pandas learner. I have 5 rows dataframe as follow:
A B C D
0 2.34 3.16 99.0 3.2
1 2.1 55.5 77.5 1
2 22.1 54 89 33
3 23 1.24 4.7 5
4 45 2.5 8.7 99
I want to replace index column 0,1...4 with new index 1 to 5. My expected output is:
A B C D
1 2.34 3.16 99.0 3.2
2 2.1 55.5 77.5 1
3 22.1 54 89 33
4 23 1.24 4.7 5
5 45 2.5 8.7 99
What I did is I create a new column:
new_index = pd.DataFrame({'#': range(1, 5 + 1 ,1)})
Then I tried to reindex:
df.reindex(new_index)
But I got error:
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
What should I do to reindex the former index? Thanks.
Use set_index
In [5081]: df.set_index([range(1, 6)])
Out[5081]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
Or set values of df.index
In [5082]: df.index = range(1, 6)
In [5083]: df
Out[5083]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
Details
Original df
In [5085]: df
Out[5085]:
A B C D
0 2.34 3.16 99.0 3.2
1 2.10 55.50 77.5 1.0
2 22.10 54.00 89.0 33.0
3 23.00 1.24 4.7 5.0
4 45.00 2.50 8.7 99.0
You need .values
df.index=df.index.values+1
df
Out[141]:
A B C D
1 2.34 3.16 99.0 3.2
2 2.10 55.50 77.5 1.0
3 22.10 54.00 89.0 33.0
4 23.00 1.24 4.7 5.0
5 45.00 2.50 8.7 99.0
As Per Zero :
df.index += 1