Plot ROC curve using tensorflow model with multiclass - tensorflow

I have trained a CNN model using tensorflow to classify 5 classes.
How do I plot the ROC curve for each of 5 classes with one-versus-rest?
From the scikit page, it says:
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
But tensorflow dataset, I don't have this "y_test" and "y_score" dataset structure
And every other tutorial website just copies the same code
I use "tf.keras.preprocessing.image_dataset_from_directory" to generate a test_ds variable
How do I get the tpr and fpr with model.predict function on test_ds? or other methods?
How do I get this "y_test" and "y_score" from model.predict and test_ds?
Apart from one-versus-rest, how could I plot the ROC for specific class 1 versus class 3?
Thanks!

Related

Converting tensorflow dataset to numpy array

I have an autoencoder defined using tf.keras in tensorflow 1.15. I cannot upgrade to tensorflow to 2.0 for some specific reasons.
This particular autoencoder is used for anomaly detection. I currently compute the AUC score of the autoencoder as follows:
All anomalous inputs are labelled 1 and all normal inputs are labelled 0. This is y_true
I feed the autoencoder with unseen inputs and then measure the reconstruction error, like so: errors = np.mean(np.square(data - model.predict(data)), axis=-1)
The mean of this array is then said to the predicted label, y_pred.
I then compute the AUC using auc = metrics.roc_auc_score(y_true, y_pred).
This approach works well. I now need to move towards using tf.data.dataset to feed in my data. Previously, it was numpy arrays. The issue is, I am unable to convert tf.data.dataset to a numpy array and hence unable to compute the mean squared error as seen in 2.
Once I have a tf.data.Dataset, I feed it for prediction like so: results = model.predict(x_test)
This yields a numpy array, results. I want to compute the mean square error of results with x_test. However, x_test is of type tf.data.Dataset. So the question is, how can I convert a tf.data.dataset to a numpy array in tensorflow 1.15 or what is an alternative method to do this?

Sparse annotation in U-Net

I am training a U-Net image segmentation on whole slide pathology images. I was wondering how can I handle un-annotated areas? I am working with huge tissues and it’s impossible to annotate all or the vast majority of the tissue, so I have annotations from a pathologist who has annotated selected tissue structures of interest to us. That means that in many tiles I’m generating, there is a segment that’s not annotated.
Would it affect the U-Net negatively by indirectly indicating that the un-annotated area is negative to one category or another, although it’s not negative? How do I handle this important case? Does it make sense to mask the image to only the annotated parts, such that un-annotated regions are black?
Thanks
One way to deal with this is to use a weighted loss function where you simply assign a weight of zero to the class that you don't want to include. Essentially, you're treating the unannotated area as an additional class that doesn't contribute to the loss function. You can find the GitHub repo to a fully functional Keras implementation here.
Specifically, I would use a weighted categorical cross-entropy loss function. You can find an implementation for Keras here:
from keras import backend as K
def weighted_categorical_crossentropy(weights):
"""
A weighted version of keras.objectives.categorical_crossentropy
Variables:
weights: numpy array of shape (C,) where C is the number of classes
Usage:
weights = np.array([0.5,2,10]) # Class one at 0.5, class 2 twice the normal weights, class 3 10x.
loss = weighted_categorical_crossentropy(weights)
model.compile(loss=loss,optimizer='adam')
"""
weights = K.variable(weights)
def loss(y_true, y_pred):
# scale predictions so that the class probas of each sample sum to 1
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
# clip to prevent NaN's and Inf's
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
# calc
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
return loss
And you can then compile your model for training like this:
model.compile(optimizer='adam', loss=weighted_categorical_crossentropy(np.array([background_weight, foreground_weight, 0])), metrics='accuracy')

How to display recall and precision in TFLearn?

I'm quiet new with tflearn. I did a cnn classifier, which classifies in 17 different classes. I run the code without any problem, and it shows me the accuracy and the loss. I was wondering how can I display the recall and precision for each class. My code is based in the example of CNN classifier to IMDB dataset of TFLearn.
Thank you for your work and your attention!
Best way to do this is to use SKLearn's metrics library. An example from some LSTM implementation on Github:
print "Precision: {}%".format(100*metrics.precision_score(y_test, predictions, average="weighted"))
print "Recall: {}%".format(100*metrics.recall_score(y_test, predictions, average="weighted"))
print "f1_score: {}%".format(100*metrics.f1_score(y_test, predictions, average="weighted"))
Here, y_test is the Y values of your test data.
predictions is the output of your model.predict(X_test) where X_test are the X values of your test data.
Another one to look at is metrics.precision_recall_fscore_support.

Making predictions with a TensorFlow model

I followed the given mnist tutorials and was able to train a model and evaluate its accuracy. However, the tutorials don't show how to make predictions given a model. I'm not interested in accuracy, I just want to use the model to predict a new example and in the output see all the results (labels), each with its assigned score (sorted or not).
In the "Deep MNIST for Experts" example, see this line:
We can now implement our regression model. It only takes one line! We
multiply the vectorized input images x by the weight matrix W, add the
bias b, and compute the softmax probabilities that are assigned to
each class.
y = tf.nn.softmax(tf.matmul(x,W) + b)
Just pull on node y and you'll have what you want.
feed_dict = {x: [your_image]}
classification = tf.run(y, feed_dict)
print classification
This applies to just about any model you create - you'll have computed the prediction probabilities as one of the last steps before computing the loss.
As #dga suggested, you need to run your new instance of the data though your already predicted model.
Here is an example:
Assume you went though the first tutorial and calculated the accuracy of your model (the model is this: y = tf.nn.softmax(tf.matmul(x, W) + b)). Now you grab your model and apply the new data point to it. In the following code I calculate the vector, getting the position of the maximum value. Show the image and print that maximum position.
from matplotlib import pyplot as plt
from random import randint
num = randint(0, mnist.test.images.shape[0])
img = mnist.test.images[num]
classification = sess.run(tf.argmax(y, 1), feed_dict={x: [img]})
plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary)
plt.show()
print 'NN predicted', classification[0]
2.0 Compatible Answer: Suppose you have built a Keras Model as shown below:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Then Train and Evaluate the Model using the below code:
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
After that, if you want to predict the class of a particular image, you can do it using the below code:
predictions_single = model.predict(img)
If you want to predict the classes of a set of Images, you can use the below code:
predictions = model.predict(new_images)
where new_images is an Array of Images.
For more information, refer this Tensorflow Tutorial.
The question is specifically about the Google MNIST tutorial, which defines a predictor but doesn't apply it. Using guidance from Jonathan Hui's TensorFlow Estimator blog post, here is code which exactly fits the Google tutorial and does predictions:
from matplotlib import pyplot as plt
images = mnist.test.images[0:10]
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x":images},
num_epochs=1,
shuffle=False)
mnist_classifier.predict(input_fn=predict_input_fn)
for image,p in zip(images,mnist_classifier.predict(input_fn=predict_input_fn)):
print(np.argmax(p['probabilities']))
plt.imshow(image.reshape(28, 28), cmap=plt.cm.binary)
plt.show()

Scikit ROC auc raises ValueError: Only one class present in y_true. ROC AUC score is not defined in that case

Trying to create a ROC curve.
model = RandomForestClassifier(500, n_jobs = -1);
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
probas = model.predict_proba(X_test)[:, 1]
precision = metrics.precision_score(y_test, y_pred) # returns 0.72
recall = metrics.recall_score(y_test.values, y_pred) # returns 0.35
y_test.shape # (39257, 1)
auc = metrics.roc_auc_score(y_test, probas) # fails.
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
Ended up answering my own question:
Had imported y_test as a pandas DataFrame instead of a Series (had saved it using to_csv and imported elsewhere with from_csv).
This confused scikit on the ROC curves, but it seems quite happy with that everywhere else.
I'll leave this here in the (unlikely) case someone runs into the same thing.
sometime we face with Imbalanced dataset.
Like at time of splitting, there would be chance that any one of Classes is not present any dataset (test dataset)
. So better to use stratify technique while splitting .
Or If you are facing while training MLP model then you can try with increasing "batch_size"
I hope, It may be helpful.
Thanks