How to keep True and None Value using pandas?

How to keep True and None Value using pandas? - pandas

I've one DataFrame
import pandas as pd
data = {'a': [1,2,3,None,4,None,2,4,5,None],'b':[6,6,6,'NaN',4,'NaN',11,11,11,'NaN']}
df = pd.DataFrame(data)
condition = (df['a']>2) | (df['a'] == None)
print(df[condition])
a b
0 1.0 6
1 2.0 6
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
6 2.0 11
7 4.0 11
8 5.0 11
9 NaN NaN
Here, i've to keep where condition is coming True and Where None is there i want to keep those rows as well.
Expected output is :
a b
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
7 4.0 11
8 5.0 11
9 NaN NaN
Thanks in Advance

You can use another | or condition (Note: See #ALlolz's comment, you shouldnt compare a series with np.nan)
condition = (df['a']>2) | (df['a'].isna())
df[condition]
a b
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
7 4.0 11
8 5.0 11
9 NaN NaN

Related

How to get the index of the last condition and assign it to other columns

condition is column 'A' > 0.5
I want to calculate the index of the last condition established and assign it to column 'cond_index'
A cond_index
0 0.001566 NaN
1 0.174676 NaN
2 0.553506 2
3 0.583377 3
4 0.418854 3
5 0.836482 5
6 0.927756 6
7 0.800908 7
8 0.277646 7
9 0.388323 7

Use Index.to_series with replace missing values if not match condition in Series.where with comapre for greater like 0.5 and last forward filling missing values:
df['new'] = df.index.to_series().where(df['A'].gt(0.5)).ffill()
print (df)
A cond_index new
0 0.001566 NaN NaN
1 0.174676 NaN NaN
2 0.553506 2.0 2.0
3 0.583377 3.0 3.0
4 0.418854 3.0 3.0
5 0.836482 5.0 5.0
6 0.927756 6.0 6.0
7 0.800908 7.0 7.0
8 0.277646 7.0 7.0
9 0.388323 7.0 7.0

Conditional aggregation after rolling in pandas

I am trying to calculate a rolling mean of a specific column based on a condition in another column.
The condition is to create three different rolling means for column A, as follows -
The rolling mean of A when column B is less than 2
The rolling mean of A when column B is equal to 2
The rolling mean of A when column B is greater than 2
Consider the following df with a window size of 2
A B
0 1 2
1 2 4
2 3 4
3 4 6
4 5 1
5 6 2
The output will be the following-
rolling less rolling equal rolling greater
0 NaN NaN NaN
1 NaN 1 2
2 NaN NaN 2.5
3 NaN NaN 3.5
4 5 NaN 4
5 5 6 NaN
The main difficulty I encountered was that the rolling function is column-wise, and on the other hand, the apply function works rows-wise, but then, calculating the rolling mean is too hard-coded.
Any ideas?
Thanks a lot.

You can create your 3 columns before rolling then compute it:
out = df.join(df.assign(rolling_less=df.mask(df['B'] >= 2)['A'],
rolling_equal=df.mask(df['B'] != 2)['A'],
rolling_greater=df.mask(df['B'] <= 2)['A'])
.filter(like='rolling').rolling(2, min_periods=1).mean())
print(out)
# Output
A B rolling_less rolling_equal rolling_greater
0 1 2 NaN 1.0 NaN
1 2 4 NaN 1.0 2.0
2 3 4 NaN NaN 2.5
3 4 6 NaN NaN 3.5
4 5 1 5.0 NaN 4.0
5 6 2 5.0 6.0 NaN

def function1(ss:pd.Series):
df11=df1.loc[:ss.name].tail(2)
return pd.Series([
df11.loc[lambda dd:dd.B<2,'A'].mean()
,df11.loc[lambda dd:dd.B==2,'A'].mean()
,df11.loc[lambda dd:dd.B>2,'A'].mean()
],index=['rolling less','rolling equal','rolling greater'],name=ss.name)
pd.concat([df1.A.shift(i) for i in range(2)],axis=1)\
.apply(function1,axis=1)
A B rolling less rolling equal rolling greater
0 1 2 NaN 1.0 NaN
1 2 4 NaN 1.0 2.0
2 3 4 NaN NaN 2.5
3 4 6 NaN NaN 3.5
4 5 1 5.0 NaN 4.0
5 6 2 5.0 6.0 NaN

How to select NaN values in pandas in specific range

I have a dataframe like this:
df = pd.DataFrame({'col1': [5,6,np.nan, np.nan,np.nan, 4, np.nan, np.nan,np.nan, np.nan,7,8,8, np.nan, 5 , np.nan]})
df:
col1
0 5.0
1 6.0
2 NaN
3 NaN
4 NaN
5 4.0
6 NaN
7 NaN
8 NaN
9 NaN
10 7.0
11 8.0
12 8.0
13 NaN
14 5.0
15 NaN
These NaN values should be replaced in the following way. The first selection should look like this.
2 NaN
3 NaN
4 NaN
5 4.0
6 NaN
7 NaN
8 NaN
9 NaN
And then these Nan values should be replace with the only value in that selection, 4.
The second selection is:
13 NaN
14 5.0
15 NaN
and these NaN values should be replaced with 5.
With isnull() you can select the NaN values in a dataframe but how are able to filter/select these specific ranges in pandas?

Solution if missing values are around one non missing val - solution create unique groups and replace in groups by forward and back filling:
#test missing values
s = df['col1'].isna()
#create unique groups
v = s.ne(s.shift()).cumsum()
#count groups and get only 1 value around, filter only misising values groups
mask = v.map(v.value_counts()).eq(1) | s
#groups for replacement per groups
g = mask.ne(mask.shift()).cumsum()
df['col2'] = df.groupby(g)['col1'].apply(lambda x: x.ffill().bfill())
print (df)
col1 col2
0 5.0 5.0
1 6.0 6.0
2 NaN 4.0
3 NaN 4.0
4 NaN 4.0
5 4.0 4.0
6 NaN 4.0
7 NaN 4.0
8 NaN 4.0
9 NaN 4.0
10 7.0 7.0
11 8.0 8.0
12 8.0 8.0
13 NaN 5.0
14 5.0 5.0
15 NaN 5.0

Concatenating dataframe that have different number of rows

I have a dataframe df = df[['A', 'B', 'C']] with 3 columns and 2000 rows
Then I have another set of data with only 200 rows
How can I add this into df['D'] such that this 200 rows will only appear as the tail of the 2000 rows?
So that from row 0-1800 for df['D'] it will be NaN and then 1801 to 2000 will be the values
Been trying various ways without success... thank you
data with 200 rows in this format
[[ 0.43628979]
[ 0.43454027]
[ 0.43552566]
[ 0.43542767]
[ 0.43331838]
...

I believe you need join with changing index by last index values of df1:
np.random.seed(100)
df1 = pd.DataFrame(np.random.randint(10, size=(20,3)), columns=list('ABC'))
print (df1)
A B C
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
6 6 2 4
7 1 5 3
8 4 4 3
9 7 1 1
10 7 7 0
11 2 9 9
12 3 2 5
13 8 1 0
14 7 6 2
15 0 8 2
16 5 1 8
17 1 5 4
18 2 8 3
19 5 0 9
df2 = pd.DataFrame(np.random.randint(10, size=(2,5)), columns=list('werty'))
print (df2)
w e r t y
0 3 6 3 4 7
1 6 3 9 0 4
df2.index = df1.index[-len(df2.index):]
df = df1.join(df2)
print (df)
A B C w e r t y
0 8 8 3 NaN NaN NaN NaN NaN
1 7 7 0 NaN NaN NaN NaN NaN
2 4 2 5 NaN NaN NaN NaN NaN
3 2 2 2 NaN NaN NaN NaN NaN
4 1 0 8 NaN NaN NaN NaN NaN
5 4 0 9 NaN NaN NaN NaN NaN
6 6 2 4 NaN NaN NaN NaN NaN
7 1 5 3 NaN NaN NaN NaN NaN
8 4 4 3 NaN NaN NaN NaN NaN
9 7 1 1 NaN NaN NaN NaN NaN
10 7 7 0 NaN NaN NaN NaN NaN
11 2 9 9 NaN NaN NaN NaN NaN
12 3 2 5 NaN NaN NaN NaN NaN
13 8 1 0 NaN NaN NaN NaN NaN
14 7 6 2 NaN NaN NaN NaN NaN
15 0 8 2 NaN NaN NaN NaN NaN
16 5 1 8 NaN NaN NaN NaN NaN
17 1 5 4 NaN NaN NaN NaN NaN
18 2 8 3 3.0 6.0 3.0 4.0 7.0
19 5 0 9 6.0 3.0 9.0 0.0 4.0

Boxplot with pandas and groupby

I have the following dataset sample:
0 1
0 0 0.040158
1 2 0.500642
2 0 0.005694
3 1 0.065052
4 0 0.034789
5 2 0.128495
6 1 0.088816
7 1 0.056725
8 0 -0.000193
9 2 -0.070252
10 2 0.138282
11 2 0.054638
12 2 0.039994
13 2 0.060659
14 0 0.038562
And need a box and whisker plot, grouped by column 0. I have the following:
plt.figure()
grouped = df.groupby(0)
grouped.boxplot(column=1)
plt.savefig('plot.png')
But I end up with three subplots. How can place all three on one plot?
Thanks.

In 0.16.0 version of pandas, you could simply do this:
df.boxplot(by='0')
Result:

I don't believe you need to use groupby.
df2 = df.pivot(columns=df.columns[0], index=df.index)
df2.columns = df2.columns.droplevel()
>>> df2
0 0 1 2
0 0.040158 NaN NaN
1 NaN NaN 0.500642
2 0.005694 NaN NaN
3 NaN 0.065052 NaN
4 0.034789 NaN NaN
5 NaN NaN 0.128495
6 NaN 0.088816 NaN
7 NaN 0.056725 NaN
8 -0.000193 NaN NaN
9 NaN NaN -0.070252
10 NaN NaN 0.138282
11 NaN NaN 0.054638
12 NaN NaN 0.039994
13 NaN NaN 0.060659
14 0.038562 NaN NaN
df2.boxplot()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to keep True and None Value using pandas? - pandas

You can use another | or condition (Note: See #ALlolz's comment, you shouldnt compare a series with np.nan) condition = (df['a']>2) | (df['a'].isna()) df[condition] a b 2 3.0 6 3 NaN NaN 4 4.0 4 5 NaN NaN 7 4.0 11 8 5.0 11 9 NaN NaN

Related

How to get the index of the last condition and assign it to other columns

Conditional aggregation after rolling in pandas

How to select NaN values in pandas in specific range

Concatenating dataframe that have different number of rows

Boxplot with pandas and groupby

Categories

Resources