SHAP Plots Heatmap - numpy

I am plotting several plots by using the shap.summary and shap.plot function as follows:
explainer = shap.TreeExplainer(rf_gridsearch.best_estimator_) shap_values = explainer.shap_values(X_test) shap.summary_plot(shap_values, X_test, plot_type="dot", color='YlOrRd') shap.plots.heatmap(shap_values)
With the summary plot, I am able to visualize my data and SHAP values, however, with the heatpmap I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'values'
I would appreciate any input as to what/why am I getting this error and how I can fix it.
I tried running:
shap.plots.heatmap(shap_values, X_test)
Also, updated the SHAP library

Related

Can not save pyplot histogram

I am trying to save a pyplot generated histogram . But when i try to save this it gives me an error savefig() takes 2 positional arguments but 3 were given. How to overcome this error?
Here is my code:
dir1=r"C:\Users\USER\Handcrafted dataset\histogram"
for i,img in enumerate(images1):
plt.figure(figsize=(5,5))
plt.hist(img.ravel(),256,[0,256])
plt.savefig(dir1+"\\"+str(i)+".jpg",img)
plt.show()
Just remove img and try using and also comment out plt.show(). Since you are creating a new figure every time, I do not see why you need img
plt.savefig(dir1+"\\"+str(i)+".jpg")
# plt.show()

How to calculate TF-IDF (using tft.tfidf function) in Tensorflow Transform

While going through the docs in tensorflow transform I came across function to perform TD-IDF.
tft.tfidf(
x, vocab_size, smooth=True, name=None
)
As the docs in not clear in providing example of how to perform TD-IDF I tried using example_string
example_strings=[["I", "like", "pie", "pie", "pie"], ["yum", "yum", "pie"]]
and a vocab size of 1000.(Just random number) but the below code giving me an attribute error.
tft.tfidf(example_strings, vocab_size=1000)
AttributeError: 'list' object has no attribute 'indices'
Please help me to figure this out as I am naive to Tensorflow transform ops.
if you would like to compute the tfidf with TFT (here an example) you can do
example_strings = ["I like pie pie pie", "yum yum pie"]
VOCAB_SIZE = 100
tf.compat.v1.disable_eager_execution()
tokens = tf.compat.v1.string_split(example_strings)
indices = tft.compute_and_apply_vocabulary(tokens, top_k=VOCAB_SIZE)
bow_indices, weight = tft.tfidf(indices, VOCAB_SIZE + 1)
otherwise, you can also use TF Tokenizer:
tk = tf.keras.preprocessing.text.Tokenizer(num_words=VOCAB_SIZE)
tk.fit_on_texts(example_strings)
tk.sequences_to_matrix(tk.texts_to_sequences(example_strings), mode='tfidf')

creating a custom legend in matplotlib.pyplot

In trying to create a custom legend, I create an array of custom lines from which I want to extract the labels.
In a for loop, I create an array of custom_names like this:
custom_names=[custom_names, [Line2D([0],[0], color=colors[i], marker='None',label=filename[i][0:9])]]
I then try to use Legend to add this as a second, custom legend:
from matplotlib.legend import Legend
leg = Legend(ax, custom_names, loc='lower right')
ax.add_artist(leg)
this produces an error:
leg = Legend(ax, custom_names, loc='lower right')
TypeError: __init__() missing 1 required positional argument: 'labels'
How do I extract the labels used in the creation of the custom_names array to use it in the Legend function(?) ? Something like:
leg = Legend(ax, custom_names, custom_names.get_label(), loc='lower right')
but this doesn't work.
In fact, if I print custom_names, I get this:
print(custom_names)
[[[[], [<matplotlib.lines.Line2D object at 0x0000021232E72BC8>]], [<matplotlib.lines.Line2D object at 0x0000021232E83E08>]], [<matplotlib.lines.Line2D object at 0x0000021232E8CE08>]]
What is "inside" the ['<'matplotlib.lines.Line2D object at 0x0000021232E72BC8'>'] element, for instance? How do I see or get access to the attributes used in the creation of that array element via the
custom_names=[custom_names, [Line2D([0],[0], color=colors[i], marker='None',label=filename[i][0:9])]]
statement?
I apologize if the question doesn't make sense. I'm struggling with calling things by their proper names (I am not a s/w guy).....I can try and clarify based on questions anyone raises about this question.

cartopy problem with pcolormesh 'GeoAxesSubplot' object has no attribute '_hold'

Just trying to learn cartopy, but can't even make a simple pcolormesh of wind gust data...
ax = plt.axes(projection = ccrs.LambertConformal())
cbax = ax.pcolormesh(gust['lon'], gust['lat'], gust['value'],
transform = ccrs.PlateCarree())
I get the error AttributeError: 'GeoAxesSubplot' object has no attribute '_hold'
However, the contfourf plot does work...
cbax = ax.contourf(gust['lon'], gust['lat'], gust['value'],
transform = ccrs.PlateCarree())
How do you make pcolormesh figures with cartopy?
In addition to configuring package versions, you might also would like to try the following code discussed here.
from matplotlib.axes import Axes
from cartopy.mpl.geoaxes import GeoAxes
GeoAxes._pcolormesh_patched = Axes.pcolormesh
It worked for me at least. Hope this will be helpful.

Stacking list of lists vertically using np.vstack is throwing an error

I am following this piece of code http://queirozf.com/entries/scikit-learn-pipeline-examples in order to develop a Multilabel OnevsRest classifier for text. I would like to compute the hamming_score and thus would need to binarize my test labels as well. I thus have:
X_train, X_test, labels_train, labels_test = train_test_split(meetings, labels, test_size=0.4)
Here, labels_train and labels_test are list of lists
[['dog', 'cat'], ['cat'], ['people'], ['nice', 'people']]
Now I need to binarize all my labels, I am therefore doing this...
all_labels = np.vstack([labels_train, labels_test])
mlb = MultiLabelBinarizer().fit(all_labels)
As directed by in the link. But that throws
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I used np.column_stack as directed here
numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"
but that throws the same error.
How can the dimensions be the same if I am splitting on train and test, I am bound to get different shapes right? Please help, thank you.
MultilabelBinarizer works on list of lists directly, so you dont need to stack them using numpy. Directly send the list without stacking.
all_labels = labels_train + labels_test
mlb = MultiLabelBinarizer().fit(all_labels)