dividing column that contains blanks pandas - pandas

I have:
df=pd.DataFrame({'a':[10,10,10],'b':[10,'',0]})
df
a b
0 10 1o
1 10
2 10 0
I want:
df['c']=df['a']/df['b']
but get error:
TypeError: unsupported operand type(s) for /: 'int' and 'str'
but I need the result to be a blank, and want
df['c] = [1,'',inf]
suggestions?

repalce it will NaN, then fill it back
(df['a']/df['b'].replace('',np.nan)).fillna('')
Out[162]:
0 1
1
2 inf
dtype: object

Related

Panda astype not converting column to int even when using errors=ignore

I have the following DF
ID
0 1.0
1 555555.0
2 NaN
3 200.0
When I try to convert the ID column to Int64 I got the following error:
Cannot convert non-finite values (NA or inf) to integer
I've used the following code to solve this problem:
df["ID"] = df["ID"].astype('int64', errors='ignore')
Although, when I use the above code my ID column persists with float64 type.
Any tip to solve this problem?
Use pd.Int64DType64 instead of np.int64:
df['ID'] = df['ID'].fillna(pd.NA).astype(pd.Int64Dtype())
Output:
>>> df
ID
0 1
1 555555
2 <NA>
3 200
>>> df['ID'].dtype
Int64Dtype()
>>> df['ID'] + 10
0 11
1 555565
2 <NA>
3 210
Name: ID, dtype: Int64
>>> print(df.to_csv(index=False))
ID
1
555555
""
200

replace any strings with nan in a pandas dataframe

I'm new to pandas and the dataframe-concept. Because of the format of my data (excel-sheets, first row is the name of my data, the second row is the unit) it's a little tricky to handle it in a data frame.
The task is to calculate new data from existing columns, e.g. df.['c'] = df['a']**2 + df.['b']
I get: TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
This did work, but is pain to my hands and eyes:
df.['c'] = df['a']
df.['c'] = df['a'].tail(len(df.['a']-1))**2 + df.['b'].tail(len(df.['b'])-1)
df.loc[0,'c'] = 'unit for c'
Is there any way to do this quicker or with less typing?
Thanks already
schamonn
Let's look at the error mentioned first in this post.
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
What this error is staying that you are trying to take and string to a power, we can replicate this error using the following example:
df = pd.DataFrame({'a':['1','2','3'],'b':[4,5,6]})
df['a']**2
Output last line of stack trace:
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
A simple resolution to this if all your a column are numeric representations, then use pd.to_numeric:
pd.to_numeric(df['a'])**2
Output:
0 1
1 4
2 9
Name: a, dtype: int64
Got non-numeric strings also in column a?
Use errors = 'coerce' as a parameter to pd.to_numeric
df = pd.DataFrame({'a':['a','1','2','3'],'b':[4,5,6,7]})
Use:
pd.to_numeric(df['a'], errors='coerce')**2
Output:
0 NaN
1 1.0
2 4.0
3 9.0
Name: a, dtype: float64
this is how I read in the data
Data = pd.read_excel(fileName, sheet_name = 'Messung')
In [154]: Data
Out[154]:
T1 T2 Messung Datum
0 °C °C - -
1 12 100 1 2018-12-06 00:00:00
2 15 200 2 2018-12-06 00:00:00
3 20 120 3 2018-12-06 00:00:00
4 10 160 4 2018-12-06 00:00:00
5 12 160 5 2018-12-06 00:00:00

How to index into a data frame using another data frame's indices?

I have a dataframe, num_buys_per_day
Date count
0 2011-01-13 1
1 2011-02-02 1
2 2011-03-03 2
3 2011-06-03 1
4 2011-08-01 1
I have another data frame commissions_buy which I'll give a small subset of:
num_orders
2011-01-10 0
2011-01-11 0
2011-01-12 0
2011-01-13 0
2011-01-14 0
2011-01-18 0
I want to apply the following command
commissions_buy.loc[num_buys_per_day.index, :] = num_buys_per_day.values * commission
where commission is a scalar.
Note that all indices in num_buys_per_day exist in commissions_buy.
I get the following error:
TypeError: unsupported operand type(s) for *: 'Timestamp' and 'float'
How should I do the correct command?
you need to first make the Date colum to the index:
num_buys_per_day.set_index('Date', inplace=True)
commission_buy.loc[num_buys_per_day.index, 'num_orders'] = num_buys_per_day['count'].values * commission

how to create a variable in pandas dataframe based on another variable

I have an issue where I have a dataframe data with multiple columns and I want to create a variable filter in the dataframe and assign the value 1 if activation_date is null else 0.
I have written this code but this is failing to get the results, everything is getting 0 irrespective if the dates are still present.
data['filter'] = [0 if x is not None else 1 for x in data['activation_dt']]
I think you need isnull for check None or NaNs and then convert True to 1 and False to 0 by astype(int):
data = pd.DataFrame({'activation_dt':[None, np.nan, 1]})
print (data)
activation_dt
0 NaN
1 NaN
2 1.0
data['filter'] = data['activation_dt'].isnull().astype(int)
print (data)
activation_dt filter
0 NaN 1
1 NaN 1
2 1.0 0

error using astype when NaN exists in a dataframe

df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(\d+)').astype(int)
df['B'] = df['B'].str.extract('(\d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN:
integer > cast to float64
If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:
df['A'] = df['A'].str.extract('(\d+)', expand=False)
df['B'] = df['B'].str.extract('(\d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
From pandas >= 0.24 there is now a built-in pandas integer.
This does allow integer nan's, so you don't need to fill na's.
Notice the capital in 'Int64' in the code below.
This is the pandas integer, instead of the numpy integer.
You need to use: .astype('Int64')
So, do this:
df['A'] = df['A'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
df['B'] = df['B'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions