MxNet: Accuracy falls to random prediciton after some iterations - mxnet

When I was training a CNN to classify images of distorted digits varying from 0 to 9, the accuracy of training set and test set improved obviously.
Epoch[0] Batch [100] Train-multi-accuracy_0=0.296000
...
Epoch[0] Batch [500] Train-multi-accuracy_0=0.881900
In Epoch[1] and Epoch[2] the accuracy oscillate slightly between 0.85 and 0.95, however,
Epoch[3] Batch [300] Train-multi-accuracy_0=0.926400
Epoch[3] Batch [400] Train-multi-accuracy_0=0.105300
Epoch[3] Batch [500] Train-multi-accuracy_0=0.098200
Since then, the accuracy was around 0.1 which meant the network only gave random prediction.
I repeated the training several times, this case occurred every time. What's wrong with it?
Is the adapted learning rate strategy the reason?
model = mx.model.FeedForward(...,
optimizer = 'adam',
num_epoch = 50,
wd = 0.00001,
...,
)

What exactly is the model you're training? If you're using the mnist dataset, usually a simple 2-layer MLP trained with sgd with give you pretty high accuracy.

Related

somehow my accuracy is very low on cifar10?

with torch.no_grad():
for data in test_loader:
images,labels = data
images, labels = images.to(device), labels.to(device)
outputs, features = net(images)
_ , predicted = torch.max(outputs,1)
total += labels.size(0)
correct += (predicted==labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * (correct / total)))
How do I obtain GPU results with a label?
I got almost 10% accuracy but my original training is accuracy is 70%.
The trick training that exact dataset (cifar10) and getting better accuracy is to use data augmentation.
Originally cifar10 has 50.000 images for training and 10.000 for validation.
If you don't augment images while training you will overfit. Training accuracy will be much bigger than validation accuracy.
So your goal is to reduce overfitting.
Best Way to Reduce overfitting is to train on more data (augment your data).
Here is one repo that may help you dealing with augmentation in PyTorch.
And in PyTorch check these to augment your data such as RandomRotation, Resize, RandomVerticalFlip, RandomSizedCrop, ...
One example of a native PyTorch transform may look like:
t = transforms.Compose([transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.RandomErasing(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))]
)
dl_train = DataLoader( torchvision.datasets.CIFAR10('/data/cifar10', download=True, train=True, transform=t),
batch_size=bs, shuffle=True)
dl_valid = DataLoader( torchvision.datasets.CIFAR10('/data/cifar10', download=True, train=False, transform=t),
batch_size=bs, shuffle=True)

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

What's the difference between the following two ways for model loading in tensorflow?

I'm training a model with tf.layers.batch_normalization API. And after training, I need to load the trained model to do prediction on new data. There are two ways to load the weights, as follows:
(1):
saver1 = tf.train.Saver(tf.global_variables(), max_to_keep=10)
saver1.restore(sess, '{}'.format(args.restore_ckpt))
(2):
saver2 = tf.train.import_meta_graph('{}.meta'.format(args.restore_ckpt))
saver2.restore(sess, '{}'.format(args.restore_ckpt))
I found (1) can generate high prediction accuracy (say, 97%), but (2) has a much lower accuracy (say, 59%). Does this mean the (2) didn't load the weights for the batch normalization layers correctly? Looking forward to your comments!
UPDATED:
I found with the model loaded by (1) have the same prediction accuracy no matter what the test batch_size is. I tried the batch size with 1 and 16, both results have 97% of accuracy.
It seems with the weights loaded by (2), I need to add the following code:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer_op.minimize(loss_total_op, global_step=global_step)
Then it would generate a high accuracy 97%. Maybe I wrongly understood the batch normalization.

Rnn prediction rate affected by batch size?

I'm using Tensorflow RNN to predict a bunch of sequences. I use Grucell and dynamic_rnn. While training, I input Training dataset, which I separate into 8 batches, each batch has batchsize of 10 (1 batch has shape of [10, 6, 2], which is [batchsize, seqlen, dim]). And to prevent overfitting, I stop training when prediction rate in Training dataset starts to exceed 80% (usually stops at accuracy of 80%~83%).
After training, I let the same graph to just predict (not train) the same Training dataset. But this time, since tf.nn.dynamic_rnn makes it possible to feed batches of variable size, I can tailor the dataset into 80 batches, each batch has batchsize of 1, and shape of [1, 10, 2] (simply lowered batchsize and therefore increased number of batches). Then, Accuracy usually exceeds 90%, which is appreciably higher than 80%. For some reason, Shrinking batchsize leads to higher accuracy rate. Why this happens?
As I understand you have less amount of training data and you also early stopping during training, Don't stop blindly for handling overfitting check your training and validation loss difference if the difference is increasing that means your model is started overfitting. And before training check, your dataset is it biased or correctly balanced. I think this is happening because you have very less training data or your data set is not balanced.

Best DNNClassifier Configuration for MNIST study in TensorFlow

I am working on MNIST dataset on TensorFlow with deep neural networks classifier. I am using the following structure for the network.
MNIST_DATASET = input_data.read_data_sets(mnist_data_path)
train_data = np.array(MNIST_DATASET.train.images, 'int64')
train_target = np.array(MNIST_DATASET.train.labels, 'int64')
test_data = np.array(MNIST_DATASET.test.images, 'int64')
test_target = np.array(MNIST_DATASET.test.labels, 'int64')
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=[tf.contrib.layers.real_valued_column("", dimension=784)],
n_classes=10, #0 to 9 - 10 classes
hidden_units=[2500, 1000, 1500, 2000, 500],
model_dir="model"
)
classifier.fit(train_data, train_target, steps=1000)
However, I faced with the 40% accuracy when I run the following line.
accuracy_score = 100*classifier.evaluate(test_data, test_target)['accuracy']
How can I tune the network? I do something wrong? Similar studies retrieved 99% accuracy in academia.
Thank you.
I find an optimum configuration on GitHub.
Firstly, that's not the best configuration. Academic studies have already reached the 99.79% accuracy on test set.
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns
, n_classes=10
, hidden_units=[128, 32]
, optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=learning_rate)
, activation_fn = tf.nn.relu
)
Also, the following parameters is transfered to the classifier.
epoch = 15000
learning_rate = 0.1
batch_size = 40
In this way, model classifies 97.83% accuray on test set, and 99.77% accuracy on trainset.
Speaking from experience, it would be a good idea to have no more than 2 hidden layers in fully connected network for MNIST dataset. i.e. hidden_units=[500, 500]. That should get to over 90% accuracy.
What is the problem? Extreme number of model parameters. For example, just second hidden layer would require (2500*1000+1000) of parameters. The rule of thumb would be to keep number of trainable parameters somewhat comparable to number of training examples, or it is at least so in classical machine learning. If otherwise, regularize model rigorously.
What steps can be taken here?
Use simpler model. Decrease number of hidden units, number of layers
Use model with smaller number of parameters. Convolutional layers, for instance, would generally utilize much smaller number of parameters for the same number of units. For instance 1000 convolutinal neurons with 3x3 kernels would need only 1000*(3*3+1) parameters
Apply regularization: batch normalization, noise injection into your input, dropout, weight decay would be good examples to start from.