unsupported operand type(s) for -: 'str' and 'str' - Implementation LimeTabularExplainer - typeerror

my issue is that I get the following error
enter image description here
when I try to follow the implementation given in the notebook
https://github.com/klemag/PyconUS_2019-model-interpretability-tutorial/blob/master/02-interpretability_LIME-solution.ipynb
by Kevin Lemagnen.
I have used his suggested way of preprocessing the data and converting them to the format needed by the LIME XAI technique.
I have used the following helper function:
`def convert_to_lime_format(X, categorical_names, col_names=None, invert=False):
"""Converts data with categorical values as string into the right format
for LIME, with categorical values as integers labels.
It takes categorical_names, the same dictionary that has to be passed
to LIME to ensure consistency.
col_names and invert allow to rebuild the original dataFrame from
a numpy array in LIME format to be passed to a Pipeline or sklearn
OneHotEncoder
"""
# If the data isn't a dataframe, we need to be able to build it
if not isinstance(X, pd.DataFrame):
X_lime = pd.DataFrame(X, columns=col_names)
else:
X_lime = X.copy()
for k, v in categorical_names.items():
if not invert:
label_map = {
str_label: int_label for int_label, str_label in enumerate(v)
}
else:
label_map = {
int_label: str_label for int_label, str_label in enumerate(v)
}
X_lime.iloc[:, k] = X_lime.iloc[:, k].map(label_map)
return X_lime`
How can I fix this issue? Any help would be greatly appreciated.
I have already looked around on Stackoverflow and I have googled the TypeError and I found the following explanation:
The python error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ occurs when you try to subtract a string from another that contains numbers in both strings. The TypeError is due to the operand type minus (‘-‘) is unsupported between str (string). Auto casting is not supported by python. You can subtract a number from a different number. If you try to subtract a string from another string that may contain a number, the error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ will be thrown.
In python, an arithmetic operation can be used between valid numbers. For example, you can subtract a number from a different number. The integer can be subtracted from a float number. If you try to subtract a string from a string that contains a number, the error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ will be thrown.
Objects other than numbers can not be used in python substraction. The arithmetic subtract can be used only for numbers. If a number is stored as a string, it should be converted to an integer before subtracting it from each string. If you try to subtract a string to a string containing a number, the error TypeError: unsupported operand type(s) for +: ‘str’ and ‘str’ will be shown.
However, I was not able to resolve the problem.

Related

Why Imputation converts float64 column to object and not converting again?

I just did mean imputation on a pandas dataframe, but the column after imputation in converted to object type and is not converting back to float. I tried astype(float), which says need a string not a method, to_numeric() also donot work.
this is it is converted to object
Applying astype() function
This is the error that comes
# mean imputation
mean_age = df['Age'].mean
df['Mean_Age'] = df['Age'].fillna(mean_age)
# This makes the column to object and astype() and to_numeric() even donot work to convert it back to float

TypeError converting from pandas data frame to numpy array

I am getting TypeError after converting pandas dataframe to numpy array (after using pd.get_dummies or by creating dummy variables from the dataframe using df.apply function) if the columns are of mixed types int, str and float.
I am not getting these errors if only using mixed types int, and str.
code:
df = pd.DataFrame({'a':[1,2]*2, 'b':['m','f']*2, 'c':[0.2, .1, .3, .5]})
dfd = pd.get_dummies(df, drop_first=True, dtype=int)
dfd.values
Error: TypeError: '<' not supported between instances of 'str' and 'int'
I am getting error with dfd.to_numpy() too.
Even if I convert the dataframe dfd to int or float values using df.astype,
dfd.to_numpy() is still producing error. I am getting error even if only selecting columns which were not changed from df.
Goal:
I am encoding categorical features of the dataframe to one hot encoding, and then want to use SelectKBest with score_func=mutual_info_classif to select some features. The error produced by the code after fitting SelectKBest is same as the error produced by dfd.to_numpy() and hence I am assuming that the error is being produced when SelectKBest is trying to convert dataframe to numpy.
Besides, just using mutual_info_classif to get scores for corresponding features is working.
How should I debug it? Thanks.
pandas converting to numpy error for mixed types

Calculate pearson correlation between a tensor and a numpy array

I have managed to form a Dataframe of the predicted tensors(y_pred) which are of (459,1) after reshaping from (459,1,1) and i have the original y values in the other column which are also float32.
I would like to measure the pearson correlation between this 2 columns. but i am getting error:
pearsonr(df_pred['y_pred'],df_pred['y'])
unsupported operand type(s) for +: 'float' and 'tuple'
So i am not sure whether i can convert the tensor to numpy array and add that to the DataFrame. I have tried
predicted= tf.reshape(predicted, [459, 1])
predicted.numpy()
But it does not work. Any ideas?
I think you have to evaluate each tensor in the column to get it's value.
df['y_pred'] = df['y_pred'].apply(lambda x: x.eval())
How to get the value of a tensor?
predicted =predicted.numpy()
The above code worked at the end. As the values were appended under a for loop only writing
predicted.numpy()
did not work.

pandas df.plot() not working - TypeError: float() argument must be a string or a number, not 'Period'

Trying to use df.plot() and I keep getting the below error - I think it might be caused to pandas version?
TypeError: float() argument must be a string or a number, not 'Period'
Plotting time series data:
enter image description here

Errors using onehot_encode incorrect input format?

I'm trying to use the mx.nd.onehot_encode function, which should be straightforward, but I'm getting errors that are difficult to parse. Here is the example usage I'm trying.
m0 = mx.nd.zeros(15)
mx.nd.onehot_encode(mx.nd.array([0]), m0)
I expect this to return a 15 dim vector (at same address as m0) with only the first element set to 1. Instead I get the error:
src/ndarray/./ndarray_function.h:73: Check failed: index.ndim() == 1 && proptype.ndim() == 2 OneHotEncode only support 1d index.
Neither ndarray is of dimension 2, so why am I getting this error? Is there some other input format I should be using?
It seems that mxnet.ndarray.onehot_encode requires the target ndarray to explicitly have the shape [1, X].
I tried:
m0 = mx.nd.zeros((1, 15))
mx.nd.onehot_encode(mx.nd.array([0]), m0)
It reported no error.