Using Googlenet and Alexnet Model is not giving accuracy on the Cat vs Dog dataset - tensorflow

I am starting to learn Convolutional Neural Networks and have designed the famous MNIST and fashion-MNIST models and obtained good accuracy.
But then I moved to another trivial dataset that is cat vs. Dog dataset from Kaggle, but after applying all my concepts, I learned from Stanford lectures and Andrew ng lectures I was only able to get 80% accuracy. So, I decided to try the GoogleNet and Alexnet, but these model were not able to give me accuracy anything above 50% on 6 epochs.
I wanted to know whether the GoogleNet and ImageNet are designed for 1000 categories output and won't work on 2 categories output?
While making my own model I obtained an accuracy of 80%. I expected the famous GoogleNet model to give me more accuracy, but that's not the case.
Below is the GoogleNet model that I am using:
data=[]
labels=[]
for i in range(0,12499):
img=cv2.imread("train/cat."+str(i)+".jpg")
res = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
data.append(res)
labels.append(0);
img2=cv2.imread("train/dog."+str(i)+".jpg")
res2 = cv2.resize(img2, dsize=(224,224),interpolation=cv2.INTER_CUBIC)
data.append(res2)
labels.append(1);
train_data, test_data,train_labels, test_labels = train_test_split(data,
labels,
test_size=0.2,
random_state=42)
model=tf.keras.Sequential()
model.add(layers.Conv2D(64,kernel_size=3,activation='relu', input_shape=
(224,224,3)))
model.add(layers.Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dense(4096,activation='relu'))
model.add(Dense(4096,activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='sparse_categorical_c rossentropy',metrics=['accuracy'])
model.fit(x=train_data,y=train_labels,batch_size=32,epochs=10,
validation_data=(test_data,test_labels))
The expected accuracy of the above google model should be more than 50%, but it's ranging between 50% and 51% after 6 epochs.
p.s I changed the last dense layer to 2 instead of 1000, and I am using Keras API for tensor flow.
Any help would be appreciated.

I struggled a bit with this earlier as well.I didn't try it yet on googlenet but I tried it on Alexnet. On Alexnet I managed to get relatively ok results (83%) for cats vs dogs after following closely to the paper. Few things you may want to do:
If you refer to the CS231n notes from Fei Fei Li
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
On slide 10, you will notice that the input layer should be 227 by 227 instead. They also provided the mathematical justification
why it is so.
I started to try and follow other items closely to the original
paper here:
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
These included:
As in the paper section 3.3, adding a normalization layer at the end of the first two max pooling layers. Keras has stopped supporting LRN but I added batch normalization and it works. (I ran an experiment of a model with batch normalization and without. The accuracy difference is 82% versus 62%
As in the paper section 4.2, I added two dropout layers (0.5) at the end of the two fully connected layers.
As in the paper section 5, I changed my batches to 128, SGD momentum of 0.9 and weight decay of 0.0005
As pointed above in one of the comments from your original question,
my final layer was also a single dimension with sigmoid function.
Training for 20 epochs gave me a 83% accuracy. In the original paper, they included data augmentation but I did not include it in my implementation.
Keras has a modified googlenet example. It is modified from the Xecption architecture, I believe one of the derivatives of the inception architecture.
https://keras.io/examples/vision/image_classification_from_scratch/
I have tried it and after running for 15 epochs, accuracy is about 90%
Hope this helps.

Related

Is validation curve slight greater or lower in CNN models good?

Can you tell me which one among the two is a good validation vs train plot?
Both of them are trained with same keras sequential layers, but the second one is trained using more number of samples, i.e. augmented the dataset.
I'm a little bit confused about the zigzags in the first plot, otherwise I think it is better than the second.
In the second plot, there are no zigzags but the validation accuracy tends to be a little high than train, is it overfitting or considerable?
It is an image detection model where the first model's dataset size is 5170 and the second had 9743 samples.
The convolutional layers defined for the model building:
tf.keras.layers.Conv2D(128,(3,3), activation = 'relu', input_shape = (150,150,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(32,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(1,activation='sigmoid')
Can the model be improved?
From the graphs the second graph where you have more samples is better. The reason is with more samples the model is trained on a much wider probability distribution of images. So when validation is run you have a better chance of correctly classifying the image. You have a lot of dropout in your model. This is good to prevent over fitting, however it will lower the training accuracy relative to the validation accuracy. Your model seems to be doing well. It might improve if you add additional convolution- max pooling layers. Alternative of course is to use transfer learning. I would recommend efficientnetb3. I also recommend using an adjustable learning rate. The Keras callback ReduceLROnPlateau works well for that purpose. Documentation is here.. Code below shows my recommended settings.
rlronp=tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
verbose=1,
mode='auto'
)
in model.fit include callbacks=[rlronp]

RNN Text Generation: How to balance training/test lost with validation loss?

I'm working on a short project that involves implementing a character RNN for text generation. My model uses a single LSTM layer with varying units (messing around with between 50 and 500), dropout at a rate of 0.2, and softmax activation. I'm using RMSprop with a learning rate of 0.01.
My issue is that I can't find a good way to characterize the validation loss. I'm using a validation split of 0.3 and I'm finding that the validation loss starts to become constant after only a few epochs (maybe 2-5 or so) while the training loss keeps decreasing. Does validation loss carry much weight in this sort of problem? The purpose of the model is to generate new strings, so quantifying the validation loss with other strings seems... pointless?
It's hard for me to really find the best model since qualitatively I get the sense that the best model is trained for more epochs than it takes for the validation loss to stop changing but also for fewer epochs than it takes for the training loss to start increasing. I would really appreciate any advice you have regarding this problem as well as any general advice about RNN's for text generation, especially regarding dropout and overfitting. Thanks!
This is the code for fitting the model for every epoch. The callback is a custom callback that just prints a few tests. I'm now realizing that history_callback.history['loss'] is probably the training loss isn't it...
for i in range(num_epochs):
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback],
validation_split=0.3)
loss_history.append(history_callback.history['loss'])
validation_loss_history.append(history_callback.history['val_loss'])
My intention for this model isn't to replicate sentences from the training data, rather, I'd like to generate sentence from the same distribution that I'm training on.
Yes history_callback.history['loss'] is Training Loss and history_callback.history['val_loss'] is the Validation Loss.
Yes, Validation Loss carries weight in this sort of problem because you just don't want to replicate the sentences which are given during Training but you want to learn the patterns from the Training Data and generate new sentences when it sees a new data.
From the information you mentioned in the question and from the insights identified from comments (thanks to Brian Bartoldson), it is understood that your model is overfitting. In addition to EarlyStopping and dropout, you can try the below mentioned techniques to mitigate overfitting problem.
3.a. Shuffle the Data, by using shuffle=True in model.fit. Code is shown below
3.b. Use recurrent_dropout. For example, If we set the value of Recurrent Dropout as 0.2 in a Recurrent Layer (LSTM), it means that it will consider only 80% of the Time Steps for that Recurrent Layer (LSTM).
3.c. Use Regularization. You can try l1 Regularization or l1_l2 Regularization as well for the arguments, kernel_regularizer, recurrent_regularizer, bias_regularizer, activity_regularizer of the LSTM Layer.
Sample code to use Shuffle, Early Stopping, Recurrent_Dropout, Regularization is shown below:
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Sequential
model = Sequential()
Regularizer = l2(0.001)
model.add(tf.keras.layers.LSTM(units = 50, activation='relu',kernel_regularizer=Regularizer ,
recurrent_regularizer=Regularizer , bias_regularizer=Regularizer , activity_regularizer=Regularizer, dropout=0.2, recurrent_dropout=0.3))
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback, callback],
validation_split=0.3, shuffle = True)
Hope this helps. Happy Learning!

What could be reasons for high MAE and MSE in Keras?

My MAE and MSE quite high. But the training data (not including test data 20%) (1030, 23) instances (after applied IQR and Z-score). By the way, all the categorical columns had been fully encoded.
Epoch: 1900, loss:50195632.3010, mae:3622.3535, mse:50195636.0000, val_loss:65308249.2427, val_mae:4636.2290, val_mse:65308244.0000,
Below is my setting for Keras.
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(dftrain.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
EPOCHS = 2000
history = model.fit(
normed_train_data,
train_labels,
epochs=EPOCHS,
validation_split = 0.2,
verbose=0,
callbacks=[tfdocs.modeling.EpochDots()])
What do you think?
"High" MAE itself is relative and varies according to the data and there could be multiple factors contributing towards it.
If you are getting started, I d recommend you to perform Exploratory Data
Analysis (EDA) and come up with features and also prepare that data for training.
Once you verify the data, try tuning the parameters of the model to suit your usecase. ML is more about experimenting than about coding.
Notebooks like these in Kaggle will help you get started.
Neural Network Model for House Prices
Comprehensive data exploration with Python
There could be many reasons actually. My quick guesses would be your dataset. The data for training. Is it compatible to the model's expectations? (shapes, formats etc.) Like, in case of text classification, are the texts encoded before feeding to the model.
Are the labels correctly, transformed to neural network expectations?
If yes, rest will be on your network definition, are you using the right loss function, layers etc?
Try a basic model architecture for your problem, this basic architecture model can be taken from implementations for the similar problem found on internet. This will give you a good starting point.
The other answers have already mentioned some good points, but another thing you can do is to normalize your data if you haven't already. NNs are highly sensitive to this. Some methods you can try here are Batch Normalization, Standard Scaler or Min-Max Scaler.
Also, if your model is overfitting (training loss decreasing, but not validation loss), consider adding regularization in the form of Dropout between your layers and see if it improves.
These links might be helpful:
link1
link2

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Best DNNClassifier Configuration for MNIST study in TensorFlow

I am working on MNIST dataset on TensorFlow with deep neural networks classifier. I am using the following structure for the network.
MNIST_DATASET = input_data.read_data_sets(mnist_data_path)
train_data = np.array(MNIST_DATASET.train.images, 'int64')
train_target = np.array(MNIST_DATASET.train.labels, 'int64')
test_data = np.array(MNIST_DATASET.test.images, 'int64')
test_target = np.array(MNIST_DATASET.test.labels, 'int64')
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=[tf.contrib.layers.real_valued_column("", dimension=784)],
n_classes=10, #0 to 9 - 10 classes
hidden_units=[2500, 1000, 1500, 2000, 500],
model_dir="model"
)
classifier.fit(train_data, train_target, steps=1000)
However, I faced with the 40% accuracy when I run the following line.
accuracy_score = 100*classifier.evaluate(test_data, test_target)['accuracy']
How can I tune the network? I do something wrong? Similar studies retrieved 99% accuracy in academia.
Thank you.
I find an optimum configuration on GitHub.
Firstly, that's not the best configuration. Academic studies have already reached the 99.79% accuracy on test set.
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns
, n_classes=10
, hidden_units=[128, 32]
, optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=learning_rate)
, activation_fn = tf.nn.relu
)
Also, the following parameters is transfered to the classifier.
epoch = 15000
learning_rate = 0.1
batch_size = 40
In this way, model classifies 97.83% accuray on test set, and 99.77% accuracy on trainset.
Speaking from experience, it would be a good idea to have no more than 2 hidden layers in fully connected network for MNIST dataset. i.e. hidden_units=[500, 500]. That should get to over 90% accuracy.
What is the problem? Extreme number of model parameters. For example, just second hidden layer would require (2500*1000+1000) of parameters. The rule of thumb would be to keep number of trainable parameters somewhat comparable to number of training examples, or it is at least so in classical machine learning. If otherwise, regularize model rigorously.
What steps can be taken here?
Use simpler model. Decrease number of hidden units, number of layers
Use model with smaller number of parameters. Convolutional layers, for instance, would generally utilize much smaller number of parameters for the same number of units. For instance 1000 convolutinal neurons with 3x3 kernels would need only 1000*(3*3+1) parameters
Apply regularization: batch normalization, noise injection into your input, dropout, weight decay would be good examples to start from.