dealing with unbalanced training data: validation results differ from training - tensorflow

I need to train a binary classifier with unbalanced data (about 3000 images in the 'ok' set and 160 in the 'ko' set). Moreover, I want to minimize false negative much more than false positive.
For this reason I weighted 20 times the 'ko' images and the training (80% of the dataset) went very well (3 false negative and 15 false positive: accuracy > 99%). However, on the validation dataset I have 19 false negative and 7 false positive, with nearly 96% of accuracy.
96% accuracy is ok for me, but I would like to obtain less false negative.
What can I try to do?

Related

How to optimize the ratio of (True positive)/(False postive) instead of accuracy?

The classic metrics is "accuracy", which is related to: (True positive + True negative)/(False positive + False negative)
In a classification problem, False negative is more tolerable than false positive. That is, I want to assign more weight to improving (True positive)/(False positive). How to accomplish this?
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Tensorflow allows sensitivities to be shifted for these metrics https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SensitivityAtSpecificity, or if you want the false positives directly (which I think only gives you access to the number of false positives if that helps): https://www.tensorflow.org/api_docs/python/tf/keras/metrics/FalsePositives. I do not know much about tensorflow but I hope this helps

What does it mean when my CNN has zero false negatives?

I'm working on a convolutional neural network to classify an image dataset with binary labels (either 0 or 1). In training the network, each epoch ends up having zero false negatives, does that mean my network is just classifying everything as 1 and not even bothering to match the 0s?
If so, how can I combat this? The dataset is uneven, but there are more 0s. For the training set, the ratio of 0:1 is about 8000:5000, and for validation 700:500.
Having zero false negative sounds pretty suspicious. What is your accuracy? How does the confusion matrix look like? Anyway, I would recommend to introduce class weights for imbalanced training data

Tensorflow/Keras: Cost function that penalizes specific errors/confusions

I have a classification scenario with more than 10 classes where one class is a dedicated "garbage" class. With a CNN I currently reach accuracies around 96%, which is good enough for me.
In this particular application false positives (recognizing "garbage" as any non-garbage class) are a lot worse than confusions between the non-garbage classes or false negatives (recognizing any non-garbage class instead of "garbage"). To reduce these false positives I am looking for a suitable loss function.
My first idea was to use the categorical crossentropy and add a penalty value whenever a false positive is detected: (pseudocode)
loss = categorical_crossentropy(y_true, y_pred) + weight * penalty
penalty = 1 if (y_true == "garbage" and y_pred != "garbage") else 0
My Keras implementation is:
def penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ref_is_garbage = K.equal(K.argmax(y_true), garbage_id)
hyp_not_garbage = K.not_equal(K.argmax(y_pred), garbage_id)
penalty_ind = K.all(K.stack([ref_is_garbage, hyp_not_garbage], axis=0), axis=0) # logical and
penalty = K.cast(penalty_ind, dtype='float32')
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
I tried different values for weight but I was not able to reduce the false positives. For small values the penalty has no effect at all (as expected) and for very large values (e.g. weight = 50) the network only ever recognizes a single class.
Is my approach complete nonsense or should that in theory work? (Its my first time working with a non-standard loss function).
Are there other/better ways to penalize such false positive errors? Sadly, most articles focus on binary classification and I could not find much for a multiclass case.
Edit:
As stated in the comments the penalty from above is not differentiable and has therefore no effect on the training upgrades. This was my next attempt:
penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ngs = (1 - y_pred[:, garbage_id]) # non garbage score (sum of scores of all non-garbage classes)
penalty = y_true[:, garbage_id] * ngs / (1.-ngs)
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
Here the combined score of all non-garbage classes are added for all samples of the minibatch that are false positives. For samples that are not false positives, the penalty is 0.
I tested the implementation on mnist with a small feedforward network and sgd optimizer using class "5" as "garbage":
With just the crossentropy the accuracy is around 0.9343 and the
"false positive rate" (class "5" images recognized as something else)
is 0.0093.
With the penalized cross entropy (weight 3.0) the accuracy is 0.9378
and the false positive rate is 0.0016
So apparently this works, however I am not sure if its the best approach. Also the adam optimizer does not work well with this loss function, thats why I had to use sgd.

How to find accuracy in multi-GPU Tensorflow training?

When we do training in Tensorflow multi-GPU environment, how do we find test accuracy and display them in terminal next to the training loss?
1. Can we find test accuracy on single GPU?
2. Should we find test accuracy on all available GPUs and then calculate the average?
You have to average the accuracies on all GPUs and that will give you the overall accuracy. Suppose you have 64 exemplars on each of 4 GPUs for a batch size of 256 (= 4 * 64 ). If 50,48,56,52 are accurately labeled in each of them respectively, then your overall accuracy is 50 + 48 + 56 + 52 = 206 / 256 = 80.47%.
Sometimes, when you want to calculate something more complicated than accuracy, you might suppose that the result from one GPU is a good approximation of the overall value and save yourself the trouble of getting all the values from all GPUs. For that to be valid, there are two conditions that have to be satisfied:
The batch should be randomly assigned to each GPU, normally they are shuffled for that purpose.
The batch should be large enough for each GPU to receive at least 32 exemplars.

Test loss seems to be increased after 20 epochs

I trained MNIST using 8 layers fully connected neural network (tensorflow) and got my result as shown below. May I know why the test loss increased after the 20 epoch? Is that due to overfitting? These are the network configurations:
L1: 1600 neurons
L2: 800 neurons
L3: 400 neurons
L4: 200 neurons
L5: 100 neurons
L6: 60 neurons
L7: 30 neurons
L8: 10 neurons
Optimizer: Adam (learning_rate = 0.001)
activation function: Relu
batch size: 64
dropout rate: 0.7
epoch:100
This is might very well be due to overfitting. In particular, note how the test loss increases but the test accuracy stays mostly the same (or even keeps increasing). This can arise from the model making wrong predictions (on the test set) with more and more certainty. I.e. it doesn't make more wrong predictions as time goes on (explaining the stable/increasing accuracy) but gets ever more confident in its currently wrong predictions (explaining the increasing cost).
This in turn could be due to the model overfitting on characteristics on the training data that don't generalize to the test data. This is particularly true for MNIST where overfitting to "spurious" features (such as single pixels) is common.
You may have seen the benchmarks listed here, the author uses 2 layers each has 300 neurons and get a high accuracy. You have more neurons which will make the network easier overfitting, so first try to reduce neurons. And you use a large batch, which will make the network hard to convergence, then secondly try using little batch or use even smaller learning rate like .0005. The last thing is to try to use LeakyRelu() or tanh() or even sigmoid(), because Relu() function maybe dead at the late learning process.