What does it mean when my CNN has zero false negatives? - tensorflow

I'm working on a convolutional neural network to classify an image dataset with binary labels (either 0 or 1). In training the network, each epoch ends up having zero false negatives, does that mean my network is just classifying everything as 1 and not even bothering to match the 0s?
If so, how can I combat this? The dataset is uneven, but there are more 0s. For the training set, the ratio of 0:1 is about 8000:5000, and for validation 700:500.

Having zero false negative sounds pretty suspicious. What is your accuracy? How does the confusion matrix look like? Anyway, I would recommend to introduce class weights for imbalanced training data

Related

Small gap between train and test error, does that mean overfitting?

i am working on a dataset of 368609 samples and 34 features, i wanted to use a neural network to predict latency (real value) using keras, the model has 3 hidden layers, each layer has 1024 neurons, i have used drop_out (50 %) and l2 regularization (0.001) for each hidden layer. The problem is i am getting a test mean absolute error of 3.5505 ms and train mean absolute error of 3.4528. Here, the train error is smaller than test error by a small gap, does this mean that we have an overfitting problem here ?
Not really, yet it is still always a good idea to see how your model is generalizing to new data.
Keep something between 10%-20% of your original dataset as a test set and try to predict the output for each record in the test set.
Sometimes when we deal with the same validation set for many attempts of improving our model, we tend to overfit the evaluation dataset as well.
Having 3 different datasets for training, evaluation and testing usually provides a whole solution to overfitting.
If you get a high accuracy on your training set and a low accuracy on you test set it often means that your are overfitting. So in you case - no you are probably not overfitting.
Normally you would also have a validation set, so you don't fit your data to the testset.

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

How do I know when to stop training my CNN?

I've been training my CNN and got the following as results:
I just know that the training and validation accuracy needs to both be high, but are these numbers good enough? How do I know when to stop? Should I concern myself with the losses, or only accuracy? Which epoch shows the best result so far?
Loss value implies how poorly or well a model behaves after each iteration of optimization. where as accuracy of a model is usually determined after the model parameters and is calculated in the form of a percentage.
Yes, training and validation should be high. Numbers always depend on subject area where we are dealing. In case of medical domain these numbers not good.
If you have serious class imbalance, your model will maximize accuracy by simply always picking the most common class, but this would not be a useful model. In this case cross entropy or log-loss would be a better loss function to optimize.
Generally the lower the loss the better a model unless the model has overfitted to the training data.
10th epoch is best where you got higher validation accuracy and lower validation loss.

Convolutional neural network doesn't classify test set keras

I have a 3-D convolutional neural network [keras, tensorflow] and 3D brain images of people with advanced alzheimer's, early alzheimer's and healthy people (3 classes). I have training set of 324 images and test set of 74 images. When I trained my CNN, I had about 65-70% accuracy but for the test set I had only 30-40%. When I used the test data as validation data then for training set I had no more than 37% accuracy as well and the loss stayed at the same level the whole time. Nevermind which parameters I change, the result is the same. I load my prepared and normalized data from .h5 file into Python, and the input have shape (None, 90, 120, 80, 1). I don't have an idea what may be wrong, I checked the code many times and everything seems to be correct.
My CNN have 4 conv3D layers, 3 max-pooling, activations:relu and batch_normalizations, 3 dense layers and dropout, softmax
I appreciate any help or ideas.
If you only have 65/70% accuracy on your training data that is really poor and indicates your neural network is not converging properly. Your network should be capable of at least overfitting the training data if the structure is complex enough, by effectively learning to hardcode the outputs from the small input sample. By the sound of it, your structure is complex enough.
The first thing to try is to reduce the learning rate by a factor of 10, and turn off validation/early stopping/normalisation/regularisation and any other ways to prevent overfitting. Then rinse, repeat - more iterations, each reducing the LR by a factor of 10 - until you can overfit the training data to where it gets close to 100% on the training data.
You can then work on putting in the proper early stopping, dropout, normalisation, regularisation etc to prevent overfitting with a learning rate you know works.
If dropping the LR doesn't even overfit however small the LR then you have some issue with your NN structure.

Tensorflow: loss decreasing, but accuracy stable

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.