Why does the accuracy drop to zero in each epoch, while training convlstm layers in keras? - tensorflow

I am trying to use ConvLSTM layers in Keras 2 to train an action recognition model. The model has 3 ConvLSTM layers and 2 Fully Connected ones.
At each and every epoch the accuracy for the first batch (usually more than one) is zero and then it increases to some amount more than the previous epoch. For example, the first epoch finishes at 0.3 and the next would finish at 0.4 and so on.
My question is why does it get back to zero at each epoch?
p.s.
The ConvLSTM is stateless.
The model is compiled with SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True), for some reason it does not converge using Adam.

So - in order to understand why something like this happening you need to understand how keras computes accuracy during the batch computation:
Before each batch - a number of positively classified examples is stored.
After each batch - a number of positively classified examples is stored and it's printed after division by all examples used in training.
As your accuracy is pretty low it's highly probable that in a first few batches none of the examples will be classified properly. Especially when you have a small batch. This makes accuracy to be 0 at the beginning of your training.

Related

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

Validation loss oscillates a lot, validation accuracy > learning accuracy, but test accuracy is high. Is my model overfitting?

I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)
At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.
When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.
Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)
I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.
Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?
The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.
Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.

Time taken to train Resnet on CIFAR-10

I was writing a neural net to train Resnet on CIFAR-10 dataset.
The paper Deep Residual Learning For Image Recognition mentions training for around 60,000 epochs.
I was wondering - what exactly does an epoch refer to in this case? Is it a single pass through a minibatch of size 128 (which would mean around 150 passes through the entire 50000 image training set?
Also how long is this expected to take to train(assume CPU only, 20-layer or 32-layer ResNet)? With the above definition of an epoch, it seems it would take a very long time...
I was expecting something around 2-3 hours only, which is equivalent to about 10 passes through the 50000 image training set.
The paper never mentions 60000 epochs. An epoch is generally taken to mean one pass over the full dataset. 60000 epochs would be insane. They use 64000 iterations on CIFAR-10. An iteration involves processing one minibatch, computing and then applying gradients.
You are correct in that this means >150 passes over the dataset (these are the epochs). Modern neural network models often take days or weeks to train. ResNets in particular are troublesome due to their massive size/depth. Note that in the paper they mention training the model on two GPUs which will be much faster than on the CPU.
If you are just training some models "for fun" I would recommend scaling them down significantly. Try 8 layers or so; even this might be too much. If you are doing this for research/production use, get some GPUs.

More epochs or more layers?

What is the difference in training if one uses more epochs or more layers?
Should these train equally, assuming consistent hyperparams?
for epoch in range(20):
LSTM
and
for epoch in range(5):
LSTM -> LSTM -> LSTM -> LSTM
I understand that there would be a difference after training. In the first case, you would send any test batch through one trained LSTM cell, while in the 2nd case, it would go through 4 trained cells. My question pertains to training.
Seems they should be identical.
I think you make a big confusion between very different concepts. Let's us go back to the basics. Very simply, in a supervised machine learning experiment you have some training data X, and a model. A model is like a function with internal parameters, you give it some data and it gives you back a prediction. Here, let us say our model has one layer, which is an LSTM. That means the parameters of our model are the parameters of the LSTM (I won't go into what they are, if you don't know them you should read the paper introducing LSTMs).
What is an epoch: very roughly, "training for n epochs" means looping n times on the training data. You show each example n times to the model for update. The more epochs the more you get your network acustomed to your training data. (I'm being very overly simplistic).
I hope it is clearer now that epochs and layers are in no way related to the layers. The layers are what your model is made of, and the epochs is about how many times you will show your examples to the model.
If you put 5 LSTM layers, you will just have 5 times more parameters. But in any case, each of your training examples will go through the 1 or 5 stacked LSTM layers...

Tensorflow: loss decreasing, but accuracy stable

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.