Merge columns based on values in multiple columns pandas

Merge columns based on values in multiple columns pandas - pandas

I have a DataFrame as follows:
Name Col2 Col3
0 A 16-1-2000 NaN
1 B 13-2-2001 NaN
2 C NaN NaN
3 D NaN 23-4-2014
4 X NaN NaN
5 Q NaN 4-5-2009
I want to make a combined column based on either data of Col2 & Col3, such it would give me following output.
Name Col2 Col3 Result
0 A 16-1-2000 NaN 16-1-2000
1 B 13-2-2001 NaN 13-2-2001
2 C NaN NaN NaN
3 D NaN 23-4-2014 23-4-2014
4 X NaN NaN NaN
5 Q NaN 4-5-2009 4-5-2009
I have tried following:
df['Result'] = np.where(df["Col2"].isnull() & df["Col3"].isnull(), np.nan, df["Col2"] if dfCrisiltemp["Col2"].notnull() else df["Col3"])
but no success.

Use combine_first or fillna:
df['new'] = df["Col2"].combine_first(df["Col3"])
#alternative
#df['new'] = df["Col2"].fillna(df["Col3"])
print (df)
Name Col2 Col3 new
0 A 16-1-2000 NaN 16-1-2000
1 B 13-2-2001 NaN 13-2-2001
2 C NaN NaN NaN
3 D NaN 23-4-2014 23-4-2014
4 X NaN NaN NaN
5 Q NaN 4-5-2009 4-5-2009
Your solution should be changed to another np.where:
df['new'] = np.where(df["Col2"].notnull() & df["Col3"].isnull(), df["Col2"],
np.where(df["Col2"].isnull() & df["Col3"].notnull(), df["Col3"], np.nan))
Or numpy.select:
m1 = df["Col2"].notnull() & df["Col3"].isnull()
m2 = df["Col2"].isnull() & df["Col3"].notnull()
df['new'] = np.select([m1, m2], [df["Col2"], df["Col3"]], np.nan)
For general solution filter all columns without first by iloc, forward fill NaNs and last select last column:
df['new'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]

Related

How to select the rows having same id and have all missing value in another column

I have the following dataframe:
ID col_1
1 NaN
2 NaN
3 4.0
2 NaN
2 NaN
3 NaN
3 3.0
1 NaN
I need the following output:
ID col_1
1 NaN
1 NaN
2 NaN
2 NaN
2 NaN
how to do this in pandas

You can create a boolean mask with isna then group this mask by ID and transform using all, then you can filter the rows with the help of this mask:
mask = df['col_1'].isna().groupby(df['ID']).transform('all')
df[mask].sort_values('ID')
Alternatively you can use groupby + filter to filter out the groups which satisfy the condition where all values in col_1 are NaN but this method should be slower than the above:
df.groupby('ID').filter(lambda g: g['col_1'].isna().all()).sort_values('ID')
ID col_1
0 1 NaN
7 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN

Let us try with isin after groupby with all
s = df['col_1'].isna().groupby(df['ID']).all()
df = df.loc[df.ID.isin(s[s].index.tolist())]
df
Out[73]:
ID col_1
0 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN
7 1 NaN

import pandas as pd
import numpy as np
df=pd.read_excel(r"D:\Stack_overflow\test12.xlsx")
df1=(df[df['cols_1'].isnull()]).sort_values(by=['ID'])
I think we can simply take out the null values.

How to do pd.fillna() with condition

Am trying to do a fillna with if condition
Fimport pandas as pd
df = pd.DataFrame(data={'a':[1,None,3,None],'b':[4,None,None,None]})
print df
df[b].fillna(value=0, inplace=True) only if df[a] is None
print df
a b
0 1 4
1 NaN NaN
2 3 NaN
3 NaN NaN
##What i want to acheive
a b
0 1 4
1 NaN 0
2 3 NaN
3 NaN 0
Please help

You can chain both conditions for test mising values with & for bitwise AND and then replace values to 0:
df.loc[df.a.isna() & df.b.isna(), 'b'] = 0
#alternative
df.loc[df[['a', 'b']].isna().all(axis=1), 'b'] = 0
print (df)
a b
0 1.0 4.0
1 NaN 0.0
2 3.0 NaN
3 NaN 0.0
Or you can use fillna with one condition:
df.loc[df.a.isna(), 'b'] = df.b.fillna(0)

In pandas replace consecutive 0s with NaN

I want to clean some data by replacing only CONSECUTIVE 0s in a data frame
Given:
import pandas as pd
import numpy as np
d = [[1,np.NaN,3,4],[2,0,0,np.NaN],[3,np.NaN,0,0],[4,np.NaN,0,0]]
df = pd.DataFrame(d, columns=['a', 'b', 'c', 'd'])
df
a b c d
0 1 NaN 3 4.0
1 2 0.0 0 NaN
2 3 NaN 0 0.0
3 4 NaN 0 0.0
The desired result should be:
a b c d
0 1 NaN 3 4.0
1 2 0.0 NaN NaN
2 3 NaN NaN NaN
3 4 NaN NaN NaN
where column c & d are affected but column b is NOT affected as it only has 1 zero (and not consecutive 0s).
I have experimented with this answer:
Replacing more than n consecutive values in Pandas DataFrame column
which is along the right lines but the solution keeps the first 0 in a given column which is not desired in my case.

Let us do shift with mask
df=df.mask((df.shift().eq(df)|df.eq(df.shift(-1)))&(df==0))
Out[469]:
a b c d
0 1 NaN 3.0 4.0
1 2 0.0 NaN NaN
2 3 NaN NaN NaN
3 4 NaN NaN NaN

Compare 2 columns and replace to None if found equal

The following command will replace all values for matching row to None.
ndf.iloc[np.where(ndf.path3=='sys_bck_20190101.tar.gz')] = np.nan
What I really need to do is to replace the value of a single column called path4 if it matches with column path3. This does not work:
ndf.iloc[np.where(ndf.path3==ndf.path4), ndf.path3] = np.nan
Update:
There is a pandas method "fillna" that can be used with axis = 'columns'.
Is there a similar method to write "NA" values to the duplcate columns?
I can do this, but it does not look like pythonic.
ndf.loc[ndf.path1==ndf.path2, 'path1'] = np.nan
ndf.loc[ndf.path2==ndf.path3, 'path2'] = np.nan
ndf.loc[ndf.path3==ndf.path4, 'path3'] = np.nan
ndf.loc[ndf.path4==ndf.filename, 'path4'] = np.nan
Update 2
Let me explain the issue:
Assuming this dataframe:
ndf = pd.DataFrame({
'path1':[4,5,4,5,5,4],
'path2':[4,5,4,5,5,4],
'path3':list('abcdef'),
'path4':list('aaabef'),
'col':list('aaabef')
})
The expected results :
0 NaN 4.0 NaN NaN a
1 NaN 5.0 b NaN a
2 NaN 4.0 c NaN a
3 NaN 5.0 d NaN b
4 NaN 5.0 NaN NaN e
5 NaN 4.0 NaN NaN f
As you can see this is reverse of fillna. And I guess there is no easy way to do this in pandas. I have already mentioned the commands I can use. I will like to know if there is a better way to achieve this.

Use:
for c1, c2 in zip(ndf.columns, ndf.columns[1:]):
ndf.loc[ndf[c1]==ndf[c2], c1] = np.nan
print (ndf)
path1 path2 path3 path4 col
0 NaN 4.0 NaN NaN a
1 NaN 5.0 b NaN a
2 NaN 4.0 c NaN a
3 NaN 5.0 d NaN b
4 NaN 5.0 NaN NaN e
5 NaN 4.0 NaN NaN f

Appending a list to a dataframe

I have a dataframe let's say:
col1 col2 col3
1 x 3
1 y 4
and I have a list:
2
3
4
5
Can I append the list to the data frame like this:
col1 col2 col3
1 x 3
1 y 4
2 Nan Nan
3 Nan Nan
4 Nan Nan
5 Nan Nan
Thank you.

Use concat or append with DataFrame contructor:
df = df.append(pd.DataFrame([2,3,4,5], columns=['col1']))
df = pd.concat([df, pd.DataFrame([2,3,4,5], columns=['col1'])])
print (df)
col1 col2 col3
0 1 x 3.0
1 1 y 4.0
0 2 NaN NaN
1 3 NaN NaN
2 4 NaN NaN
3 5 NaN NaN

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merge columns based on values in multiple columns pandas - pandas

Related

How to select the rows having same id and have all missing value in another column

How to do pd.fillna() with condition

In pandas replace consecutive 0s with NaN

Compare 2 columns and replace to None if found equal

Appending a list to a dataframe

Categories

Resources