ValueError: could not convert string to float: 'n/a' - pandas

My Error is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-b8c70965fe8e> in <module>
10 new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
11 print("floated x:", new_df.CustomerLocation)
---> 12 new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
13 print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
14 else:
ValueError: could not convert string to float: 'n/a'
My code is:
new_df=pd.DataFrame({'MerchantLocation':(tuple(num) for num in data.merchant_long_lat), 'CustomerLocation': (tuple(num) for num in data.long_lat)})
for index, row in new_df.iterrows():
x=row.CustomerLocation
y=row.MerchantLocation
#new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (y[0].replace(r'^\s*$', np.NaN, regex=True), y[1].replace(r'^\s*$', np.NaN, regex=True)))
if x[0]!="" or x[1]!="" or y[0]!="" or y[1]!="":
print("x:",x)
new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
else:
print("There is an empty string")
There is an empty cell in raw excel data as I checked. This is from ANZ's Virtual Internship. I am unable to catch the empty string. Please help!

Try surrounding your problematic code region with a try-catch statement.
Something like:
if x[0]!="" or x[1]!="" or y[0]!="" or y[1]!="":
try:
print("x:",x)
new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
except:
print("one of these should be N/A")
print(x[0], x[1], y[0], y[1])
else:
print("There is an empty string")

Related

Problem while trying to delete row with certain value

I have a problem while trying to delete row:
Error:
ValueError Traceback (most recent call last)
<ipython-input-186-83339e440bcb> in <module>()
1 df.head()
2 df['bathrooms'] = df['bathrooms'].astype('int64')
----> 3 df['bathrooms'] = df[df['bathrooms'] != 28]
1 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _set_item_frame_value(self, key, value)
3727 len_cols = 1 if is_scalar(cols) else len(cols)
3728 if len_cols != len(value.columns):
-> 3729 raise ValueError("Columns must be same length as key")
3730
3731 # align right-hand-side columns if self.columns
ValueError: Columns must be same length as key
Code:
df['bathrooms'] = df['bathrooms'].astype('int64')
df['bathrooms'] = df[df['bathrooms'] != 28]
dataframe:
Any help is appreciated very
df['bathrooms'] != 28 gives you bool values.
df[df['bathrooms'] != 28] gives you a dataframe.
then you are assigning a dataframe to a column. df['bathrooms'] = df[df['bathrooms'] != 28]
If you want a new dataframe you can do:
df = df[df['bathrooms'] != 28]

Why I can't loop xmltodict?

Ive'been trying to transform all my logs in a dict through xmltodict.parse function
The thing is, when I try to convert a single row to a variable it works fine
a = xmltodict.parse(df['CONFIG'][0])
Same to
parsed[1] = xmltodict.parse(df['CONFIG'][1])
But when I try to iterate the entire dataframe and store it on a dictionaire I get the following
for ind in df['CONFIG'].index:
parsed[ind] = xmltodict.parse(df['CONFIG'][ind])
---------------------------------------------------------------------------
ExpatError Traceback (most recent call last)
/tmp/ipykernel_31/1871123186.py in <module>
1 for ind in df['CONFIG'].index:
----> 2 parsed[ind] = xmltodict.parse(df['CONFIG'][ind])
/opt/conda/lib/python3.9/site-packages/xmltodict.py in parse(xml_input, encoding, expat, process_namespaces, namespace_separator, disable_entities, **kwargs)
325 parser.ParseFile(xml_input)
326 else:
--> 327 parser.Parse(xml_input, True)
328 return handler.item
329
ExpatError: syntax error: line 1, column 0
Can you try this?
for ind in range(len(df['CONFIG'])):
parsed[ind] = xmltodict.parse(df['CONFIG'][ind])

How to get rid of "AttributeError: 'float' object has no attribute 'log2' "

Say I have a data frame with columns of min value =36884326.0, and max value =6619162563.0, which I need to plot as box plot, so I tried to log transform the values, as follows,
diff["values"] = diff['value'].apply(lambda x: (x+1))
diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
However, the above lines are throwing the error as follows,
AttributeError Traceback (most recent call last)
<ipython-input-28-fe4e1d2286b0> in <module>
1 diff['value'].max()
2 diff["values"] = diff['value'].apply(lambda x: (x+1))
----> 3 diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
~/software/anaconda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3192 else:
3193 values = self.astype(object).values
-> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype)
3195
3196 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-28-fe4e1d2286b0> in <lambda>(x)
1 diff['value'].max()
2 diff["values"] = diff['value'].apply(lambda x: (x+1))
----> 3 diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
AttributeError: 'float' object has no attribute 'log2'
Any suggestions would be great. Thanks
You need numpy.log2 function to aplly, please, check sintaxis here.

how to fix the calculation error which says 'DataFrame' object is not callable

im working on football data set and this is following error im getting. please help,
#what is the win rate of HomeTeam?
n_matches = df.shape[0]
n_features = df.shape[1] -1
n_homewin = len(df(df.FTR == 'H'))
win_rate = (float(n_homewin) / (n_matches)) * 100
print ("Total number of matches,{}".format(n_matches))
print ("Number of features,{}".format(n_features))
print ("Number of maches won by hom team,{}".format (n_homewin))
print ("win rate of home team,{:.2f}%" .format(win_rate))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-122-7e4d81fc684e> in <module>
5 n_features = df.shape[1] -1
6
----> 7 n_homewin = len(df(df.FTR == 'H'))
8
9 win_rate = (float(n_homewin) / (n_matches)) * 100
TypeError: 'DataFrame' object is not
expected result should print the team winning ratio
I think problem is with (), need [] for filter by boolean indexing:
n_homewin = len(df[df.FTR == 'H'])
Or simplier count Trues values by sum:
n_homewin = (df.FTR == 'H').sum()
you should modify it to df[df.FTR == 'H']. The parentheses imply a function call

Error in using np.NaN is vectorize functions

I am using Python 3 on 64bit Win1o. I had issues with the following simple function:
def skudiscounT(t):
s = t.find("ITEMADJ")
if s >= 0:
t = t[s + 8:]
if t.find("-") == 2:
return t
else:
return np.nan # if change to "" it will work fine!
I tried to use this function in np.Vectorize and got the following error:
Traceback (most recent call last):
File "C:/Users/lz09/Desktop/P3/SODetails_Clean_V1.py", line 45, in <module>
SO["SKUDiscount"] = np.vectorize(skudiscounT)(SO['Description'])
File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2739, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2818, in _vectorize_call
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: could not convert string to float: '23-126-408'
When I replace the last line [return np.nan] to [return ''] it worked fine. Anyone know why this is case? Thanks!
Without otypes the dtype of the return array is determined by the first trial result:
In [232]: f = np.vectorize(skudiscounT)
In [234]: f(['abc'])
Out[234]: array([ nan])
In [235]: _.dtype
Out[235]: dtype('float64')
I'm trying to find an argument that returns a string. It looks like your function can also return None.
From the docs:
The data type of the output of vectorized is determined by calling
the function with the first element of the input. This can be avoided
by specifying the otypes argument.
With otypes:
In [246]: f = np.vectorize(skudiscounT, otypes=[object])
In [247]: f(['abc', '23-126ITEMADJ408'])
Out[247]: array([nan, None], dtype=object)
In [248]: f = np.vectorize(skudiscounT, otypes=['U10'])
In [249]: f(['abc', '23-126ITEMADJ408'])
Out[249]:
array(['nan', 'None'],
dtype='<U4')
But for returning a generic object dtype, I'd use the slightly faster:
In [250]: g = np.frompyfunc(skudiscounT, 1,1)
In [251]: g(['abc', '23-126ITEMADJ408'])
Out[251]: array([nan, None], dtype=object)
So what kind of array do you want? float that can hold np.nan, string? or object that can hold 'anything'.