pandas devide column by another one safely ( zeros, None, str) - pandas

How to divide one column by another one zero and str safe?
I don't want to create new 'A' and 'B' cols without zeros and str for some reason. If devision is not possible, I want to get Nones.
df = pd.DataFrame({'A': [0, None, 2, 1 ,5], 'B': [1, 3,'', 'cat', 4]})
I try:
df['C'] = df['B'].divide(df['A'], fill_value=None) # error with zero devision
In fact, this works, but maybe there is more elegant way?
`df['C'] = df.apply(lambda row: row['B']/row['A'] if isinstance(row['A'], numbers.Number) and isinstance(row['B'], numbers.Number) and row['A'] != 0 else None, axis = 1) # this works perfectly but looks ugly`

Use pd.to_numeric to coerce non-numeric types:
import pandas as pd
import numpy as np
df['C'] = pd.to_numeric(df['B'], errors='coerce').divide(pd.to_numeric(df['A'], errors='coerce'))
# A B C
#0 0.0 1 inf
#1 NaN 3 NaN
#2 2.0 NaN
#3 1.0 cat NaN
#4 5.0 4 0.8
If you don't want np.inf then:
df['C'] = df.C.replace(np.inf, np.NaN)

Related

drop rows from a Pandas dataframe based on which rows have missing values in another dataframe

I'm trying to drop rows with missing values in any of several dataframes.
They all have the same number of rows, so I tried this:
model_data_with_NA = pd.concat([other_df,
standardized_numerical_data,
encode_categorical_data], axis=1)
ok_rows = ~(model_data_with_NA.isna().all(axis=1))
model_data = model_data_with_NA.dropna()
assert(sum(ok_rows) == len(model_data))
False!
As a newbie in Python, I wonder why this doesn't work? Also, is it better to use hierarchical indexing? Then I can extract the original columns from model_data.
In Short
I believe the all in ~(model_data_with_NA.isna().all(axis=1)) should be replaced with any.
The reason is that all checks here if every value in a row is missing, and any checks if one of the values is missing.
Full Example
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'a':[1, 2, 3]})
df2 = pd.DataFrame({'b':[1, np.nan]})
df3 = pd.DataFrame({'c': [1, 2, np.nan]})
model_data_with_na = pd.concat([df1, df2, df3], axis=1)
ok_rows = ~(model_data_with_na.isna().any(axis=1))
model_data = model_data_with_na.dropna()
assert(sum(ok_rows) == len(model_data))
model_data_with_na
a
b
c
0
1
1
1
1
2
nan
2
2
3
nan
nan
model_data
a
b
c
0
1
1
1

How to remove all type of nan from the dataframe.?

I had a data frame, which is shown below. I want to merge column values into one column, excluding nan values.
Image 1:
When I am using the code
df3["Generation"] = df3[df3.columns[5:]].apply(lambda x: ','.join(x.dropna()), axis=1)
I am getting results like this.
Image 2:
I suspect that these columns are of type string; thus, they are not affected by x.dropna().
One example that I made is this, which gives similar results as yours.
df = pd.DataFrame({'a': [np.nan, np.nan, 1, 2], 'b': [1, 1, np.nan, None]}).astype(str)
df.apply(lambda x: ','.join(x.dropna()))
0 nan,1.0
1 nan,1.0
2 1.0,nan
3 2.0,nan
dtype: object
-----------------
# using simple string comparing solves the problem
df.apply(lambda x: ','.join(x[x!='nan']), axis=1)
0 1.0
1 1.0
2 1.0
3 2.0
dtype: object

Replace NaN values of pandas.DataFrame based on values of other columns (according to formula)

Demo dataframe:
import pandas as pd
df = pd.DataFrame({'a': [1,None,3], 'b': [5,10,15]})
I want to replace all NaN values in a with the corresponding values in b**2, and make b NaN (shift NaN values and make some operations on them).
Desired result:
1 5
100 NaN
3 15
How is it possible with pandas?
You can get the rows you want to change using df['a'].isnull(). Then you can use that to update the columns with loc.
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, None, 3], 'b': [5, 10, 15]})
change = df['a'].isnull()
df.loc[change, ['a', 'b']] = [df.loc[change, 'b']**2, np.NaN]
print(df)
Note that the change variable is only to keep from repeating df['a'].isnull() on both sides of the assignment. You could replace it with that expression to do this in one line, but I think that looks cluttered.
Result:
a b
0 1.0 5.0
1 100.0 NaN
2 3.0 15.0

Python pandas DataFrame operations with NaN

On pandas DataFrame, I'm trying to compute percent change between two features. For example:
df = pd.DataFrame({'A': [100, 100, 100], 'B': [105, 110, 93], 'C': ['NaN', 102, 'NaN']})
I attempting to compute change between df['A'] - df['C'], but on the rows where we have 'NaN', use value from 'B' column.
Expecting result: [-5, -2, 7]
since, df['C'].loc[0] is NaN, first value is 100 - 105 (from 'B').
But second value is 100 -102.
I think simpliest is replace missing values by another column by Series.fillna:
#if need replace strings NaN to missing values np.nan
df['C'] = pd.to_numeric(df.C, errors='coerce')
s = df['A'] - df['C'].fillna(df.B)
print (s)
0 -5.0
1 -2.0
2 7.0
dtype: float64
Another idea with numpy.where and test missing values by Series.isna:
a = np.where(df.C.isna(), df['A'] - df['B'], df['A'] - df['C'])
print (a)
[-5. -2. 7.]
s = df['A'] - np.where(df.C.isna(), df['B'], df['C'])
print (s)
0 -5.0
1 -2.0
2 7.0
Name: A, dtype: float64

pandas dataframe multiplication with missing values

I have a dataframe with 2columns (floating types), but one of them has missing data represented by a string ".."
When performing a multiplication operation, an exception is raised and the whole operation is aborted.
What I try to achieve is to perform the multiplication for the float values and leave ".." for the missing ones.
2 * 6
.. * 4
should give [12, ..]
I found a naive solution consisting in replacing .. by 0 then perform the multiplication, then replace back the 0 by ..
It doesn't seem very optimized. Any other solution?
df['x'] = pd.to_numeric(df['x'], errors='coerce').fillna(0)
mg['x'] = df['x'] * df["Value"]
for col in mg.columns:
mg[col] = mg[col].apply(update)
def update(v):
if (v == 0):
return ".."
return v
You can use np.where and Series.isna:
import numpy as np
mg['x'] = np.where(df['X'].isna(), df['X'], df['X']*df['Value'])
If you want to replace the null with '..' and multiply others:
mg['x'] = np.where(df['X'].isna(), '..', df['X']*df['Value'])
So anywhere the Value of column x is null, the it remains the same, otherwise it's multiplies with the value of the corresponding row of Value column
In you solution you can also do a fillna(1):
df['x'] = pd.to_numeric(df['x'], errors='coerce').fillna(1)
mg['x'] = df['x'] * df["Value"]
This is how I tried:
df = pd.DataFrame({'X': [ 2, np.nan],
'Value': [6, 4]})
df
X Value
0 2.0 6
1 NaN 4
np.where(df['X'].isna(), df['X'], df['X']*df['Value'])
array([12., nan])