TensorFlow CNN some kernels not learning - tensorflow

I have an AlexNet implementation in TensorFlow which is converging, though not to the supposed loss values... I start with a learning rate of 0.01 (as stated in the paper for ILSVRC2012) and right before the first LR drop, the loss is about 6.0, where it should be around 3.0 and 3.5... now that I exported the weights for visualisation, I noticed that some of the channels in the convolutional layers are not learning... in the first layer I have 96 channels, but only 63 of them have "normal values", the remaining 33 have 0 values...
Here is an image of the kernels, each normalised to its maximum. The unlearned kernels are the ones that are "random".
Any ideias as to why? I have a dropout rate of 50% during train in the first 2 FC layers.
Thanks in advance.

Related

Using Googlenet and Alexnet Model is not giving accuracy on the Cat vs Dog dataset

I am starting to learn Convolutional Neural Networks and have designed the famous MNIST and fashion-MNIST models and obtained good accuracy.
But then I moved to another trivial dataset that is cat vs. Dog dataset from Kaggle, but after applying all my concepts, I learned from Stanford lectures and Andrew ng lectures I was only able to get 80% accuracy. So, I decided to try the GoogleNet and Alexnet, but these model were not able to give me accuracy anything above 50% on 6 epochs.
I wanted to know whether the GoogleNet and ImageNet are designed for 1000 categories output and won't work on 2 categories output?
While making my own model I obtained an accuracy of 80%. I expected the famous GoogleNet model to give me more accuracy, but that's not the case.
Below is the GoogleNet model that I am using:
data=[]
labels=[]
for i in range(0,12499):
img=cv2.imread("train/cat."+str(i)+".jpg")
res = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
data.append(res)
labels.append(0);
img2=cv2.imread("train/dog."+str(i)+".jpg")
res2 = cv2.resize(img2, dsize=(224,224),interpolation=cv2.INTER_CUBIC)
data.append(res2)
labels.append(1);
train_data, test_data,train_labels, test_labels = train_test_split(data,
labels,
test_size=0.2,
random_state=42)
model=tf.keras.Sequential()
model.add(layers.Conv2D(64,kernel_size=3,activation='relu', input_shape=
(224,224,3)))
model.add(layers.Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dense(4096,activation='relu'))
model.add(Dense(4096,activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='sparse_categorical_c rossentropy',metrics=['accuracy'])
model.fit(x=train_data,y=train_labels,batch_size=32,epochs=10,
validation_data=(test_data,test_labels))
The expected accuracy of the above google model should be more than 50%, but it's ranging between 50% and 51% after 6 epochs.
p.s I changed the last dense layer to 2 instead of 1000, and I am using Keras API for tensor flow.
Any help would be appreciated.
I struggled a bit with this earlier as well.I didn't try it yet on googlenet but I tried it on Alexnet. On Alexnet I managed to get relatively ok results (83%) for cats vs dogs after following closely to the paper. Few things you may want to do:
If you refer to the CS231n notes from Fei Fei Li
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
On slide 10, you will notice that the input layer should be 227 by 227 instead. They also provided the mathematical justification
why it is so.
I started to try and follow other items closely to the original
paper here:
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
These included:
As in the paper section 3.3, adding a normalization layer at the end of the first two max pooling layers. Keras has stopped supporting LRN but I added batch normalization and it works. (I ran an experiment of a model with batch normalization and without. The accuracy difference is 82% versus 62%
As in the paper section 4.2, I added two dropout layers (0.5) at the end of the two fully connected layers.
As in the paper section 5, I changed my batches to 128, SGD momentum of 0.9 and weight decay of 0.0005
As pointed above in one of the comments from your original question,
my final layer was also a single dimension with sigmoid function.
Training for 20 epochs gave me a 83% accuracy. In the original paper, they included data augmentation but I did not include it in my implementation.
Keras has a modified googlenet example. It is modified from the Xecption architecture, I believe one of the derivatives of the inception architecture.
https://keras.io/examples/vision/image_classification_from_scratch/
I have tried it and after running for 15 epochs, accuracy is about 90%
Hope this helps.

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

Test loss seems to be increased after 20 epochs

I trained MNIST using 8 layers fully connected neural network (tensorflow) and got my result as shown below. May I know why the test loss increased after the 20 epoch? Is that due to overfitting? These are the network configurations:
L1: 1600 neurons
L2: 800 neurons
L3: 400 neurons
L4: 200 neurons
L5: 100 neurons
L6: 60 neurons
L7: 30 neurons
L8: 10 neurons
Optimizer: Adam (learning_rate = 0.001)
activation function: Relu
batch size: 64
dropout rate: 0.7
epoch:100
This is might very well be due to overfitting. In particular, note how the test loss increases but the test accuracy stays mostly the same (or even keeps increasing). This can arise from the model making wrong predictions (on the test set) with more and more certainty. I.e. it doesn't make more wrong predictions as time goes on (explaining the stable/increasing accuracy) but gets ever more confident in its currently wrong predictions (explaining the increasing cost).
This in turn could be due to the model overfitting on characteristics on the training data that don't generalize to the test data. This is particularly true for MNIST where overfitting to "spurious" features (such as single pixels) is common.
You may have seen the benchmarks listed here, the author uses 2 layers each has 300 neurons and get a high accuracy. You have more neurons which will make the network easier overfitting, so first try to reduce neurons. And you use a large batch, which will make the network hard to convergence, then secondly try using little batch or use even smaller learning rate like .0005. The last thing is to try to use LeakyRelu() or tanh() or even sigmoid(), because Relu() function maybe dead at the late learning process.

Convolutional neural network doesn't classify test set keras

I have a 3-D convolutional neural network [keras, tensorflow] and 3D brain images of people with advanced alzheimer's, early alzheimer's and healthy people (3 classes). I have training set of 324 images and test set of 74 images. When I trained my CNN, I had about 65-70% accuracy but for the test set I had only 30-40%. When I used the test data as validation data then for training set I had no more than 37% accuracy as well and the loss stayed at the same level the whole time. Nevermind which parameters I change, the result is the same. I load my prepared and normalized data from .h5 file into Python, and the input have shape (None, 90, 120, 80, 1). I don't have an idea what may be wrong, I checked the code many times and everything seems to be correct.
My CNN have 4 conv3D layers, 3 max-pooling, activations:relu and batch_normalizations, 3 dense layers and dropout, softmax
I appreciate any help or ideas.
If you only have 65/70% accuracy on your training data that is really poor and indicates your neural network is not converging properly. Your network should be capable of at least overfitting the training data if the structure is complex enough, by effectively learning to hardcode the outputs from the small input sample. By the sound of it, your structure is complex enough.
The first thing to try is to reduce the learning rate by a factor of 10, and turn off validation/early stopping/normalisation/regularisation and any other ways to prevent overfitting. Then rinse, repeat - more iterations, each reducing the LR by a factor of 10 - until you can overfit the training data to where it gets close to 100% on the training data.
You can then work on putting in the proper early stopping, dropout, normalisation, regularisation etc to prevent overfitting with a learning rate you know works.
If dropping the LR doesn't even overfit however small the LR then you have some issue with your NN structure.