Why is my Neural Net not learning - tensorflow

I have a CNN that I'm trying to train and I cant figure out why its not learning. It has 32 classes which are different types of clothes of about 1000 images in each folder.
Issue is this is the result at the end of training which takes about 9 hours on my GPU
loss: 3.3403 - acc: 0.0542 - val_loss: 3.3387 - val_acc: 0.0534
If anyone could give me directions on how to get this network to train better I would be grateful.
# dimensions of our images.
img_width, img_height = 228, 228
train_data_dir = 'Clothes/train'
validation_data_dir = 'Clothes/test'
nb_train_samples = 25061
nb_validation_samples = 8360
epochs = 20
batch_size = 64
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
model = Sequential()
model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='tanh', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='tanh'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
[test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle = True)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle = True)
history = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
Here is the plot of the training & validation loss

A network may not converge/learn for several reasons, but here is a list of tips that I think is relevant in your case (based on my own experience):
Transfer Learning: The first thing you should know is that it's very hard to train an image classifier from scratch for most problems, you need much more computing power and time for that. I strongly recommend using transfer learning. There are multiple trained architectures available in Keras that you can use as initial weights fr your network (or other methods).
Training step: For the optimizer, I recommend to use Adam first and to vary the learning rate to see how the loss responds. Also, since you are using Convolutional Layers, you should consider adding Batch Normalization Layers, that can speed significantly the training time, and change the Convolutional activations to 'relu', which make them much faster to train.
You could also try decreasing the Dropout values but I don't think that's the main issue here. Also, If you are considering training your network from scratch,
you should start with fewer layers and add more gradually to get a better idea of ​​what's going on.
train/test split: I see that you are using 8360 observations in your test set. Given the size of your training set, I think it's too much. 1000 for example is enough. The more training samples you have, the more satisfying your results will be.
Also, before judging the accuracy of your model, you should start by establishing a baseline model to benchmark your model. The baseline model depends on your problem, but in general I choose a model that predicts the most common class in the dataset. You should also look at another metric 'top_k_accuracy' available in Keras that is interesting when you have a relatively high number of classes to predict. It helps you to see how close your model is to the right prediction.

First, in order to keep your sanity, check carefully for any bugs, and that your data is being sent in as intended
You might want to add a Top K accuracy metric to get a better idea of whether it's close to getting it, or totally wrong.
Here are some tuning things to try:
Change the kernel size to 3 and activation to relu
model.add(Conv2D(filters=64, kernel_size=3, padding='same', activation='relu'))
If you think your model is underfitting then try increasing the number of Conv layers per pooling to start with. But you could also increase the number of filters or the number of conv + pool repetitions.
Adam optimizer might learn a bit faster than RMS prop
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Probably the biggest improvement would be to get more data. I think your data set is probably too small for the scope of the problem.
You might want to try transfer learning from a pre-trained image recognition network.

Related

Deep LSTM accuracy not crossing 50%

I am working on a classification problem of the semeval 2017 task 4A dataset can be found here
and I am using deep LSTM network for it. In pre-processing, I have done lower casing->tokenization->lemmatization->removing stop words->removing punctuations. For word embeddings, I have used WORD2VEC model. There are 18,000 samples in my training set and 2000 samples in testing.
The code for my model is
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True), input_shape=(128, 1,64)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
The value of max_words is 2000 and max_len is 300
But even after this, my testing accuracy is not crossing 50%. I can't figure out the problem.
PS - I am using validation technique too. The loss function is 'Binary Crossentropy' and optimizer is 'Adam'.
Training "LSTM" is very different with other common deep learning model.
I recommend a higher dropout rate like 0.7,0.8. and Adam optimizer is particularly unstable in LSTM with real world data. So, i recommend SGD scheduled for a momentum of 0.9 and ReduceLROnPlateau. You have to do very long training, and if spark loss is observed, the training is going very well. (Spark Loss is a word used by NVIDIA researchers. It refers to a phenomenon in which the value of Loss that appears to converge increases significantly.)

DCNN for Binary Classification Converges to 50%/50%

I am new to Keras, and never asked a question here, so excuse me any rookie mistakes I might make.
What I am trying to do is to implement a binary classifier, operating on images (CTs to be exact).
My model is based on a pretrained net, that performed classification on 14 classes (see wonderful git here https://github.com/jrzech/reproduce-chexnet).
As the saying goes, "crawl before you walk, walk before you run", my current humble goal is to achieve overfitting of the network on some 100 examples.
My current problem is that the net converges to a weird solution, with the output neuron (im using sigmoid) always very close to 50%, with 100% of the predictions going to one class (that way im stuck at about 50% accuracy). My loss and accuracy do not change at all from epoch 1 or so.
Things I tried/considered:
using different optimizers (i used Adam optimizer and the following SGD).
trying also to go with categorical crossentropy (with softmax layer at the end, instead of sigmoid, since some say it might perform better [Keras' fit_generator() for binary classification predictions always 50%).
adding an additional denselayer (I thought i might be underfitting somehow).
tried to maybe change the batchsize, to 128 (and overfit on 1000 examples).
All failed miserably, so im kind of at a lost here. I would be happy to provide more details if needed, and would appreciate any help or insights you might have. Major parts of my code are attached. Note that the ModelFactory() that I'm loading and using is the pretrained one.
Thanks in advance!
data generator code
rescale = 1./255.0
target_size = (224, 224)
batch_size = 128
train_datagen = ImageDataGenerator(
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
rescale=rescale
)
train_generator = train_datagen.flow_from_dataframe(
train_csv,
directory=train_path,
x_col='image_name',
y_col='class',
target_size=target_size,
color_mode='rgb',
class_mode='binary',
batch_size=batch_size,
shuffle=True,
)
my model
def get_model():
file_name='/content/brucechou1983_CheXNet_Keras_0.3.0_weights.h5'
base_model = ModelFactory().get_model(class_names=[str(i) for i in range(14)],
weights_path=file_name)
x = base_model.output
x = keras.layers.Dense(1024, activation='relu')(x)
x = keras.layers.BatchNormalization(trainable=True)(x)
predictions = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.models.Model(inputs=base_model.inputs, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.summary()
return model
training the model
class_weight = sklearn.utils.class_weight.compute_class_weight('balanced',np.unique(train_csv['class']), train_csv['class'])
model.compile(keras.optimizers.SGD(lr=1e-6, decay=1e-6, momentum=0.9, nesterov=True),
loss='binary_crossentropy',
metrics=['binary_accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=len(train_generator),
epochs=10,
verbose=1,
class_weight=class_weight
)

How to use augumented data when using transfer learning?

I have used VGG16 for transfer learning and got very low accuracy. Is it possible to use data augmentation technique to increase the accuracy when using transfer learning?
Following is the code for better understanding:
# Show the image paths
train_path = 'myNetDB/train' # Relative Path
valid_path = 'myNetDB/valid'
test_path = 'myNetDB/test'
train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(224, 224), classes=['dog', 'cat'], batch_size=10)
valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(224, 224), classes=['dog', 'cat'], batch_size=4)
test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224, 224), classes=['dog', 'cat'], batch_size=10)
vgg16_model= load_model('Fetched_VGG.h5')
# transform the model to Sequential
model= Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
# Freezing the layers (Oppose weights to be updated)
for layer in model.layers:
layer.trainable = False
# adding the last layer
model.add(Dense(2, activation='softmax'))
model.compile(Adam(lr=.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_batches, steps_per_epoch=4,
validation_data=valid_batches, validation_steps=4, epochs=5, verbose=2)
predictions = model.predict_generator(test_batches, steps=1, verbose=0)
If you got very low accuracy, it might be that your dataset is very different from the dataset VGG16 was trained on. There are two possibilities:
your dataset is big enough such that you can train your model starting from the pre-trained weights.
your dataset is small. In this case there are no shortcuts. You should consider a simpler model than VGG16 so that you're less likely to incur in overfitting.
In both cases, to answer your question, yes, augmentation techniques, when done consciously, help increasing the accuracy.

How to create a "none of the above" label in a Keras model

I am working a simple image classification. Each object must fit in one of the categories based on its material (aluminum, iron, copper)
There is only one picture for each class, e.g. all aluminum materials don't appear in the same photo along with iron materials for example. The model is working pretty well and the accuracy is great. However I don't know how to handle images that don't fit any of these 3 categories. Let's say I submit a picture of a piece of wood. This obliviously don't fit in any of the 3 categories, but my model seems to "guess" one of them and give one of these random categories a false positive along with a high probability. I understand the result of model.predict() cannot be zero, the ideal scenario. I have tested both softmax and sigmoid activations to no avail. I also tried to create a bogus category called "none" and trained the model with random photos of objects that do not have any of the aforementioned materials. The result was the whole model to become unreliable and lost most of the accuracy I had before.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=(64, 64, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(classes, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
checkpoint = ModelCheckpoint(filepath='c:/Users/data/models/model-{epoch:02d}-{val_loss:.2f}.hdf5',save_best_only=True)
callbacks_list = [checkpoint]
model.fit(x_train, y_train,
batch_size=75,
epochs=20,
verbose=1,
validation_data=(x_valid, y_valid), callbacks=[checkpoint])
From my understanding, this is almost impossible with supervised learning.
Supervised learning takes some dataset and let the machine learn. However the category that falls under "none" is simply too much. It is almost impossible to cover all other materials under the category of none. Worst is that supervised learning will mostly recognize what had been trained. so when a completely new stuff appears in your testing. it is very likely, the system will ignore or give one of the result based on the given training.
One approach that is more suitable in your application is unsupervised learning. There should be many resources and researches in using unsupervised learning for image classification. One of the sample paper:
https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Siva_Looking_Beyond_the_2013_CVPR_paper.pdf
Any feedback is welcomed. I stand corrected. Thank you
You need to add yet another class which you will label items not matching your existing labels. So you have aluminum, iron, copper and none_of_the_above
Change your model in line:
model.add(Dense(classes, activation='sigmoid'))
to
model.add(Dense(classes+1, activation='sigmoid'))
And modify your data to label wood as none_of_the_above. You will need a lot of examples that do not match aluminum, iron, copper.
Now add custom accuracy:
def ignore_accuracy_of_class(class_to_ignore=0):
def ignore_acc(y_true, y_pred):
y_true_class = K.argmax(y_true, axis=-1)
y_pred_class = K.argmax(y_pred, axis=-1)
ignore_mask = K.cast(K.not_equal(y_pred_class, class_to_ignore), 'int32')
matches = K.cast(K.equal(y_true_class, y_pred_class), 'int32') * ignore_mask
accuracy = K.sum(matches) / K.maximum(K.sum(ignore_mask), 1)
return accuracy
return ignore_acc
and add it to compile:
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy', ignore_accuracy_of_class(4)])
4 in ignore_accuracy_of_class is a label to ignore. Now you have both accuracy of the whole model and accuracy for only your selected aluminum, iron, copper labels.

RNN Not Generalizing on Text Classification

I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
100,
weights=[embeddings_matrix],
input_length=max_message_length,
embeddings_regularizer=l2(0.001),
trainable=True)
# Creating the Model
model = Sequential()
model.add(embedding_layer)
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.7))
model.add(layers.GRU(128))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())