How can I check Nan value in dataframe?

How can I check Nan value in dataframe? - pandas

I want to check whether specific columns in dataframe contains nan or not. Then remove the row whose specific columns contain nan.
Here is my wrong code:
import numpy as np
import pandas as pd
from numpy import nan
df = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
columns=['a', 'b', 'c'])
for i in range(len(df.index)):
print(type(df["b"].loc[i]))
if df["b"].loc[i] is np.float64(nan):
df = df.drop([i])
print(df)
But df["b"].loc[i] is np.float64(nan) is False and the result is
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
a b c
0 NaN 2.0 3.0
1 NaN NaN 6.0
2 NaN 8.0 9.0
I can use another code to make it, but I want to know why the above code cannot do it.
Right code is
df1 = pd.DataFrame(np.array([[nan, 2, 3], [nan, nan, 6], [nan, 8, 9]]),
columns=['a', 'b', 'c'])
for i in range(len(df1.index)):
if df1.isna()["b"].loc[i]:
df1 = df1.drop([i])
print(df1)

The reason is that the is operator is not a suitable way to test equality in the context of NaN values.
Here is a post which discusses the topic in more detail.

Related

How to merge same name column from two different dataframes?

I have four different datasets. I have merged three of the dataframes correctly. I have same name column in 3rd and 4th dataset. When I merge it with 4th dataset. I am not getting the same name column values in well mannerd way. The user_id is repeating when I merge. I don't want to repeat the user_id. I want to see the value in the del_keys column where it's showing me NaN value rather than it's showing me the value in the last of table. Moreover, I want to merge values of same name column on the basis of their user_id.
In the above image you can see what kind of problem I am getting.
My expected output will look like. There should not be repeated user_id.

using merge on user_id column
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'user_id': [1, 2, 3, 4],
'del': [1.0, np.nan, np.nan, np.nan]
})
df2 = pd.DataFrame({
'user_id': [3, 4, 5],
'del_keys': [1.0, 2.0, 3.0]
})
final=df.merge(df2,on="user_id",how="outer")
Combine first to get rid of Nan values and then drop duplicates
final["del_keys"]=final['del_keys_y'].combine_first(final['del_keys_x'])
final.drop(columns=["del_keys_x","del_keys_y"],inplace=True)
final.drop_duplicates(subset="user_id")

I'm guessing that you use pd.concat to merge the dataframes.
Some dataframes:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'user_id': [1, 2, 3],
'del_keys': [1.0, np.nan, np.nan]
})
df2 = pd.DataFrame({
'user_id': [3, 4, 5],
'del_keys': [1.0, 2.0, 3.0]
})
Merge using pd.concat:
df = pd.concat([df1, df2])
>>> user_id del_keys
0 1 1.0
1 2 NaN
2 3 NaN
0 3 1.0
1 4 2.0
2 5 3.0
Remove duplicates using pd.drop_duplicates:
(
df
.sort_values('del_keys')
.drop_duplicates('user_id', keep='first')
.sort_values('user_id')
)
>>> user_id del_keys
0 1 1.0
1 2 NaN
0 3 1.0
1 4 2.0
2 5 3.0
First, we sort the values by del_keys such that all NaNs are the bottom of the dataframe. Then we can drop the duplicates and keep the first occurrence for each user_id. Lastly, we can sort again to restore the original order.

How to remove all type of nan from the dataframe.?

I had a data frame, which is shown below. I want to merge column values into one column, excluding nan values.
Image 1:
When I am using the code
df3["Generation"] = df3[df3.columns[5:]].apply(lambda x: ','.join(x.dropna()), axis=1)
I am getting results like this.
Image 2:

I suspect that these columns are of type string; thus, they are not affected by x.dropna().
One example that I made is this, which gives similar results as yours.
df = pd.DataFrame({'a': [np.nan, np.nan, 1, 2], 'b': [1, 1, np.nan, None]}).astype(str)
df.apply(lambda x: ','.join(x.dropna()))
0 nan,1.0
1 nan,1.0
2 1.0,nan
3 2.0,nan
dtype: object
-----------------
# using simple string comparing solves the problem
df.apply(lambda x: ','.join(x[x!='nan']), axis=1)
0 1.0
1 1.0
2 1.0
3 2.0
dtype: object

Replace NaN values of pandas.DataFrame based on values of other columns (according to formula)

Demo dataframe:
import pandas as pd
df = pd.DataFrame({'a': [1,None,3], 'b': [5,10,15]})
I want to replace all NaN values in a with the corresponding values in b**2, and make b NaN (shift NaN values and make some operations on them).
Desired result:
1 5
100 NaN
3 15
How is it possible with pandas?

You can get the rows you want to change using df['a'].isnull(). Then you can use that to update the columns with loc.
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, None, 3], 'b': [5, 10, 15]})
change = df['a'].isnull()
df.loc[change, ['a', 'b']] = [df.loc[change, 'b']**2, np.NaN]
print(df)
Note that the change variable is only to keep from repeating df['a'].isnull() on both sides of the assignment. You could replace it with that expression to do this in one line, but I think that looks cluttered.
Result:
a b
0 1.0 5.0
1 100.0 NaN
2 3.0 15.0

Python pandas DataFrame operations with NaN

On pandas DataFrame, I'm trying to compute percent change between two features. For example:
df = pd.DataFrame({'A': [100, 100, 100], 'B': [105, 110, 93], 'C': ['NaN', 102, 'NaN']})
I attempting to compute change between df['A'] - df['C'], but on the rows where we have 'NaN', use value from 'B' column.
Expecting result: [-5, -2, 7]
since, df['C'].loc[0] is NaN, first value is 100 - 105 (from 'B').
But second value is 100 -102.

I think simpliest is replace missing values by another column by Series.fillna:
#if need replace strings NaN to missing values np.nan
df['C'] = pd.to_numeric(df.C, errors='coerce')
s = df['A'] - df['C'].fillna(df.B)
print (s)
0 -5.0
1 -2.0
2 7.0
dtype: float64
Another idea with numpy.where and test missing values by Series.isna:
a = np.where(df.C.isna(), df['A'] - df['B'], df['A'] - df['C'])
print (a)
[-5. -2. 7.]
s = df['A'] - np.where(df.C.isna(), df['B'], df['C'])
print (s)
0 -5.0
1 -2.0
2 7.0
Name: A, dtype: float64

Pandas replace empty with value based on column using dictionary

I have a dataframe with a few dozen columns. I'd like to replace NaN or empty values with a specific number or string, depending on the column. Is there a dictionary approach that would work? Dictionary example below, not sure how to apply it to a dataframe. Using Python 2.7
mydict ={'ColA': -999, 'ColB': -888, 'ColC': 'TBD'}

Just use pandas.DataFrame.fillna:
import pandas as pd
df = pd.DataFrame({'ColA': [1, np.nan, 3], 'ColB':[10, np.nan, 30], 'ColC':[100, np.nan, 300]})
mydict ={'ColA': -999, 'ColB': -888, 'ColC': 'TBD'}
new_df = df.fillna(mydict)
print(new_df)
Output:
ColA ColB ColC
0 1.0 10.0 100
1 -999.0 -888.0 TBD
2 3.0 30.0 300

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I check Nan value in dataframe? - pandas

The reason is that the is operator is not a suitable way to test equality in the context of NaN values. Here is a post which discusses the topic in more detail.

Related

How to merge same name column from two different dataframes?

How to remove all type of nan from the dataframe.?

Replace NaN values of pandas.DataFrame based on values of other columns (according to formula)

Python pandas DataFrame operations with NaN

Pandas replace empty with value based on column using dictionary

Categories

Resources