keras CNN : train and validation set are identical but with different accuracy - tensorflow

I know this a very bad thing to do but I noticed something strange using keras mobilenet :
I use the same data for training and validation set :
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
class_mode = "categorical"
)
validation_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
class_mode = "categorical"
)
but I don't get the same accuracy on both !
epoch 30/30 - loss: 0.3485 - acc: 0.8938 - val_loss: 1.7545 - val_acc: 0.4406
It seems that I am overfitting the training set compared to the validation set.. but they are supposed to be the same ! How is that possible ?

The training loss is calculated on the fly and only the validation loss is calculated after the epoch is trained. So at the beginning a nearly untrained net will make the training loss worse that it actually is. This effect should vanish in later epochs, since then one epochs mpact on the scoring is not that big anymore.
This behaviour is adressed in keras faq.
If you evaluate both at the end of epoch with a self written callback, they should be the same.

For people reading this after a while :
I still don't understand how this issue happened but it helped a lot working on the batchsize (reducing it).

Related

Evaluate trained and loaded CNN keras model problem

I am new in the deep neuron network world. I tried to train my one model using the TensorFlow Keras toolkit.
I managed to train a model using the fit function. The accuracy, for 50 epochs, was good - around 96% and the model predicts well with new data. The problem is when I try to evaluate the loaded model the results are like the model wasn't trained at all (accuracy around 50%).
I prepare the small test. I evaluate the model after a fit. Then I save the model, load it, and evaluate it once again. The results are so different. I thought that maybe weights aren't loaded properly, but the documentation suggests that save and load functions operate on the whole model. Here is my code:
CNNmodelHistory = model.fit(train_data, batch_size=batch_size, validation_data=test_data, steps_per_epoch=train_data.samples // batch_size,
epochs=echos)
scores = model.evaluate(test_data, verbose=0)
print(f'Test loss: {scores[0]} / Test accuracy: {scores[1]}')
# save the model to disk
model.save('gender_detection.modelTest')
modelLoaded = keras.models.load_model('gender_detection.modelTest')
scores = modelLoaded.evaluate(test_data, verbose=0)
print(f'Test loss: {scores[0]} / Test accuracy: {scores[1]}')
And here are the results:
Do you have any tips on what I am doing wrong?

Defining model to reduce overfitting effect of batch normalization

I'm trying to train my model using transfer learning from pretrained model with 30 classes and 7200 images(80% train, 10% validation, 10% test). My model is always overfitting despite changing various parameters. After i read this link https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets, i know batch normalization always update variance even though the convolutional base was freeze.
So, i set training = false in base_model. But, i'm still confused. Is my code correct? Because my image was augmented using ImageDataGenerator not like example where augmentation and preprocessing used as base model input.
This is my code
#Create the model
inputs = keras.Input(shape=(224, 224, 3))
x = base_model(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
outputs = tf.keras.layers.Dense((len(CLASS_NAMES)), activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)
history = model.fit_generator(train_data_gen,
epochs=epochs,
steps_per_epoch=int(np.ceil(total_train / float(BATCH_SIZE))),
validation_data=val_data_gen,
validation_steps=int(np.ceil(total_val / float(BATCH_SIZE))),
callbacks=[cm_callback,tensorboard_callback])
Output
576/576 [==============================] - 157s 273ms/step - loss: 0.0075 - accuracy: 0.9996
144/144 [==============================] - 26s 181ms/step - loss: 0.0092 - accuracy: 1.0000
[0.007482105916197825, 0.99956596]
[0.009182391463279297, 1.0]
If my code is correct, Is it good that the validation accuracy = 1(too accurate)?

Training and testing in convolutional neural networks

I would like to know at what stage testing dataset is used CNNs? Is it used after completion of each batch or one epoch during training or is it used after completion of all the epochs ? I am a bit confused as to how these two processes run together ? Similarly gradient updation is done after each batch or each epoch ?
model.fit_generator(
aug.flow(x_train, y_train, batch_size=BATCH_SIZE),
validation_data=(x_test, y_test),
steps_per_epoch=len(x_train) // BATCH_SIZE,
epochs=EPOCHS, verbose=1, callbacks = callbacks)
From fit_generator it is only clear that images are loaded batch by batch onto memory.
Keras is using validation datasets in the end of every epoch (if you didn't change validation_freq in the fit function). Each epoch your model trains on the whole train dataset and later evaluates itself on the validation dataset

Understanding epoch, batch size, accuracy and performance gain in lstm forecasting model

I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/

Neural network gives different results for each execution

This is the exact code I'm running with Keras and TensorFlow as a back end. For each run with the same program, the training results are different. Some times it gets 100% accuracy in 400th iteration and some times in the 200th.
training_data = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
target_data = np.array([[0],[1],[1],[0]], "float32")
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['binary_accuracy'])
model.fit(training_data, target_data, epochs=500, verbose=2)
Epoch 403/500
0s - loss: 0.2256 - binary_accuracy: 0.7500
So why does the result change in each execution as the train data is fixed ? Would greatly appreciate some explanation.
The training set is fixed, but we set the initial weights of the neural network to a random value in a small range, so each time you train the network you get slightly different results.
If you want reproducible results you can set the numpy random seed with numpy.random.seed to a fixed value, so the same weights will be used, but beware that this can bias your network.