How to optimize the ratio of (True positive)/(False postive) instead of accuracy? - tensorflow

The classic metrics is "accuracy", which is related to: (True positive + True negative)/(False positive + False negative)
In a classification problem, False negative is more tolerable than false positive. That is, I want to assign more weight to improving (True positive)/(False positive). How to accomplish this?
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

Tensorflow allows sensitivities to be shifted for these metrics https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SensitivityAtSpecificity, or if you want the false positives directly (which I think only gives you access to the number of false positives if that helps): https://www.tensorflow.org/api_docs/python/tf/keras/metrics/FalsePositives. I do not know much about tensorflow but I hope this helps

Related

dealing with unbalanced training data: validation results differ from training

I need to train a binary classifier with unbalanced data (about 3000 images in the 'ok' set and 160 in the 'ko' set). Moreover, I want to minimize false negative much more than false positive.
For this reason I weighted 20 times the 'ko' images and the training (80% of the dataset) went very well (3 false negative and 15 false positive: accuracy > 99%). However, on the validation dataset I have 19 false negative and 7 false positive, with nearly 96% of accuracy.
96% accuracy is ok for me, but I would like to obtain less false negative.
What can I try to do?

What does it mean when my CNN has zero false negatives?

I'm working on a convolutional neural network to classify an image dataset with binary labels (either 0 or 1). In training the network, each epoch ends up having zero false negatives, does that mean my network is just classifying everything as 1 and not even bothering to match the 0s?
If so, how can I combat this? The dataset is uneven, but there are more 0s. For the training set, the ratio of 0:1 is about 8000:5000, and for validation 700:500.
Having zero false negative sounds pretty suspicious. What is your accuracy? How does the confusion matrix look like? Anyway, I would recommend to introduce class weights for imbalanced training data

Tensorflow/Keras: Cost function that penalizes specific errors/confusions

I have a classification scenario with more than 10 classes where one class is a dedicated "garbage" class. With a CNN I currently reach accuracies around 96%, which is good enough for me.
In this particular application false positives (recognizing "garbage" as any non-garbage class) are a lot worse than confusions between the non-garbage classes or false negatives (recognizing any non-garbage class instead of "garbage"). To reduce these false positives I am looking for a suitable loss function.
My first idea was to use the categorical crossentropy and add a penalty value whenever a false positive is detected: (pseudocode)
loss = categorical_crossentropy(y_true, y_pred) + weight * penalty
penalty = 1 if (y_true == "garbage" and y_pred != "garbage") else 0
My Keras implementation is:
def penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ref_is_garbage = K.equal(K.argmax(y_true), garbage_id)
hyp_not_garbage = K.not_equal(K.argmax(y_pred), garbage_id)
penalty_ind = K.all(K.stack([ref_is_garbage, hyp_not_garbage], axis=0), axis=0) # logical and
penalty = K.cast(penalty_ind, dtype='float32')
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
I tried different values for weight but I was not able to reduce the false positives. For small values the penalty has no effect at all (as expected) and for very large values (e.g. weight = 50) the network only ever recognizes a single class.
Is my approach complete nonsense or should that in theory work? (Its my first time working with a non-standard loss function).
Are there other/better ways to penalize such false positive errors? Sadly, most articles focus on binary classification and I could not find much for a multiclass case.
Edit:
As stated in the comments the penalty from above is not differentiable and has therefore no effect on the training upgrades. This was my next attempt:
penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ngs = (1 - y_pred[:, garbage_id]) # non garbage score (sum of scores of all non-garbage classes)
penalty = y_true[:, garbage_id] * ngs / (1.-ngs)
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
Here the combined score of all non-garbage classes are added for all samples of the minibatch that are false positives. For samples that are not false positives, the penalty is 0.
I tested the implementation on mnist with a small feedforward network and sgd optimizer using class "5" as "garbage":
With just the crossentropy the accuracy is around 0.9343 and the
"false positive rate" (class "5" images recognized as something else)
is 0.0093.
With the penalized cross entropy (weight 3.0) the accuracy is 0.9378
and the false positive rate is 0.0016
So apparently this works, however I am not sure if its the best approach. Also the adam optimizer does not work well with this loss function, thats why I had to use sgd.

tensorflow batch normalization gives doesn't work as expected when is_training flag is False

I have a model in which I perform batch normalization after every convolutional layer expect the last one. I use the function tensorflow.contrib.layers.batch_norm function to do this. When I set the is__training flag as True the loss value that is reported seems correct. For my particular example, it starts at 60s and decreases to almost 0. When I set the is_training flag to flase I get my loss value in the order of 1e10 which seems absurd.
I have attached the snippet I use in my code.
loss=loss_func_l2(logits,y)
update_ops=tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer=tf.train.AdamOptimizer(learning_rate=lr)
Trainables=optimizer.minimize(loss)
#Training
sess=tf.Session()
training(train_output,train_input,sess) # is_training is true here
#validation
validate(test_output,train_input,sess) # is_training is false here
What could be the reason?

weighting true positives vs true negatives

This loss function in tensorflow is used as a loss function in keras/tensorflow to weight binary decisions
It weights false positives vs false negatives:
targets * -log(sigmoid(logits)) +
(1 - targets) * -log(1 - sigmoid(logits))
The argument pos_weight is used as a multiplier for the positive targets:
targets * -log(sigmoid(logits)) * pos_weight +
(1 - targets) * -log(1 - sigmoid(logits))
Does anybody have any suggestions how in addition true positives could be weighted against true negatives if the loss/reward of them should not have an equal weight?
First, note that with cross entropy loss, there is some (possibly very very small) penalty to each example (even if correctly classified). For example, if the correct class is 1 and our logit is 10, the penalty will be
-log(sigmoid(10)) = 4*1e-5
This loss (very slightly) pushes the network to produce even higher logit for this case to get its sigmoid even closer to 1. Similarly, for negative class, even if the logit is -10, the loss will push it to be even more negative.
This is usually fine because the loss from such terms is very small. If you would like your network to actually achieve zero loss, you can use label_smoothing. This is probably as close to "rewarding" the network as you can get in the classic setup of minimizing loss (you can obviously "reward" the network by adding some negative number to the loss. That won't change the gradient and training behavior though).
Having said that, you can penalize the network differently for various cases - tp, tn, fp, fn - similarly to what is described in Weight samples if incorrect guessed in binary cross entropy. (It seems like the implementation there is actually incorrect. You want to use corresponding elements of the weight_tensor to weight individual log(sigmoid(...)) terms, not the final output of cross_entropy).
Using this scheme, you might want to penalize very wrong answers much more than almost right answers. However, note that this is already happening to a degree because of the shape of log(sigmoid(...)).