Issue that I am facing: the training mae loss, training mae error (metric), validation mae loss are decreasing but the validation mae error fluctuates but it's overall trend is not decreasing.
Model description: consists of two networks(timedistributed CNN and timedistributed Dense layers respectively applied to each temporal slice of input in the sample) the output of which merge into one and then are fed to the first LSTM layer as a sequence in time. The output of this LSTM goes to the second LSTM layer which finally gives the output.
Model loss and metric:
tensorflow model loss in model.compile method = MAE
tensorflow model metric in model.compile method = MAE
Model output: forecast 6 numbers of the target label at a 10 minute interval (i.e. 6 numbers for the next one hour)
Model input: input data for one hour. Images to the CNN branch. Image timestamp specific tabular features to the Dense branch. The interval between the data points in both the images and their respective tabular features is 10 minutes.
Training size: 76,600 samples (from year 2018 to 2021)
Validation size: 13,500 samples (year 2022)
Regularization used: L2 and Dropout in CNN & dense branches and LSTM.
Learning rate used: 2e-4 which decays at a rate of 0.4 after every 5 epochs till 1e-5.
Epochs: 20
eg: input data from 9am to 9.50am will consist of 6 timestamps separated at 10 minute interval. At each of these timestamps we'll have an image and corresponding tabular data features. The output will be 6 numbers at 10am, 10.10am ,10.20am, 10.30am, 10.40am and 10.50am for the target label.
What have I tried: I have played around with learning rate and the number of layers in the model which helped me to decrease everything except for validation mae error. If I apply more regularization then: training mae loss, training mae error, validation mae loss don't decrease as much as with little regularization.
I am not able to understand as to what would it mean if everything except for validation mae is decreasing.
Also, if it's important to know: the training mae loss is decreasing faster than the validation mae loss.
Pasting the links to the relevant images (not allowed to paste inline images by stackoverflow).
The graph of train vs validation metric(mae)
The graph of train vs validation loss(mae)
Thank you for all the help in advance.
Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history =, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet ( This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.
As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.
I have a deep network of only fully connected/dense layers with the shape 128-256-512-1024-1024 all layers use LeakyReLU activation, with no dropout and the final layer has a softmax activation.
During training after the 20th epoch the validation/test loss starts to reverse and go up but the test accuracy continues to increase also. How does this make sense? And is the test accuracy actually accurate if it were shown new data or is there some kind of false positive going on here?
I compiled the model like so:
Graphs of my train/test accuracy and loss curves:
This may help. It's the true labels plotted against the predicted labels for the last epoch:
This is easily possible with a loss function that is sensitive to the distance between an incorrect prediction and the ground truth. You can get 90% of the predictions correct, but if the misses are ridiculously far off the mark, your loss value can increase. This results in some models from not accurately identifying one or two critical factors in the ground truth.
I am trying to use ConvLSTM layers in Keras 2 to train an action recognition model. The model has 3 ConvLSTM layers and 2 Fully Connected ones.
At each and every epoch the accuracy for the first batch (usually more than one) is zero and then it increases to some amount more than the previous epoch. For example, the first epoch finishes at 0.3 and the next would finish at 0.4 and so on.
My question is why does it get back to zero at each epoch?
The ConvLSTM is stateless.
The model is compiled with SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True), for some reason it does not converge using Adam.
So - in order to understand why something like this happening you need to understand how keras computes accuracy during the batch computation:
Before each batch - a number of positively classified examples is stored.
After each batch - a number of positively classified examples is stored and it's printed after division by all examples used in training.
As your accuracy is pretty low it's highly probable that in a first few batches none of the examples will be classified properly. Especially when you have a small batch. This makes accuracy to be 0 at the beginning of your training.
My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.