Why is the val_acc of my model 4 classification always around 25%? My train_acc is increasing - google-colaboratory

I train my model in Colab. I want to use mobilenet_v2 model to classify picture to 4 categories, but the accuracy is always about 25%.
And the train_acc is increasing, but the val_acc is not.
Part of my codes as follows:
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(input_shape=(224,224,3), weights=None, classes=4)
from tensorflow.keras.optimizers import RMSprop
base_model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'GarbageForMaixHub',
target_size=(224,224),
batch_size=16,
class_mode='categorical'
)
base_model.fit(
train_generator,
steps_per_epoch=64,
epochs=5,
verbose=1,
validation_data=train_generator, # same to train data
validation_steps=16
)
And part of the outputs as follows:
Epoch 1/5
64/64 [==============================] - 15s 226ms/step - loss: 1.1683 - acc: 0.4434 - val_loss: 1.4742 - val_acc: 0.2070
Epoch 2/5
64/64 [==============================] - 15s 228ms/step - loss: 1.2155 - acc: 0.4258 - val_loss: 1.4351 - val_acc: 0.2852
Epoch 3/5
64/64 [==============================] - 15s 228ms/step - loss: 1.1015 - acc: 0.5029 - val_loss: 1.4660 - val_acc: 0.2617
Epoch 4/5
64/64 [==============================] - 14s 223ms/step - loss: 1.0972 - acc: 0.5088 - val_loss: 1.5096 - val_acc: 0.2852
Epoch 5/5
64/64 [==============================] - 14s 222ms/step - loss: 1.0932 - acc: 0.5195 - val_loss: 1.6133 - val_acc: 0.2266
<keras.callbacks.History at 0x7ff01f7eeeb0>
The val_acc is so different from train_acc when I used my train_generator as my validation_generator.
What could I do to solve, or to avoid such problem?
I feel a little clueless. I tried some diffirent model such as tf.keras.applications.EfficientNetB0, but it's no different. I have check the picture and labels, no problems found.

Related

Questions about the number of learning [duplicate]

So I've been following Google's official tensorflow guide and trying to build a simple neural network using Keras. But when it comes to training the model, it does not use the entire dataset (with 60000 entries) and instead uses only 1875 entries for training. Any possible fix?
import tensorflow as tf
from tensorflow import keras
import numpy as np
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss= tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
Output:
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3183 - accuracy: 0.8866
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3169 - accuracy: 0.8873
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3144 - accuracy: 0.8885
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3130 - accuracy: 0.8885
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3110 - accuracy: 0.8883
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3090 - accuracy: 0.8888
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3073 - accuracy: 0.8895
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3057 - accuracy: 0.8900
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3040 - accuracy: 0.8905
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3025 - accuracy: 0.8915
<tensorflow.python.keras.callbacks.History at 0x7fbe0e5aebe0>
Here's the original google colab notebook where I've been working on this: https://colab.research.google.com/drive/1NdtzXHEpiNnelcMaJeEm6zmp34JMcN38
The number 1875 shown during fitting the model is not the training samples; it is the number of batches.
model.fit includes an optional argument batch_size, which, according to the documentation:
If unspecified, batch_size will default to 32.
So, what happens here is - you fit with the default batch size of 32 (since you have not specified anything different), so the total number of batches for your data is
60000/32 = 1875
It does not train on 1875 samples.
Epoch 1/10
1875/1875 [===
1875 here is the number of steps, not samples. In fit method, there is an argument, batch_size. The default value for it is 32. So 1875*32=60000. The implementation is correct.
If you train it with batch_size=16, you will see the number of steps will be 3750 instead of 1875, since 60000/16=3750.
Just use batch_size = 1, if you want the entire 60000 data samples to be visible.

why keras wont train on my entire image set

I am training a convolutional model in keras.
The size of my training data is
(60000, 28, 28, 1)
I build a simple model here
model = Sequential()
model.add(
Conv2D(
filters=32,
kernel_size=(4, 4),
input_shape=(28, 28, 1),
activation='relu'
)
)
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
When I tried to fit this model to my data in the following line
model.fit(
x_train,
y_train_hot_encode,
epochs=20,
validation_data=(x_test, y_test_hot_encode),
)
I noticed something weird in the logs
Epoch 1/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.5311 - accuracy: 0.8109 - val_loss: 0.3381 - val_accuracy: 0.8780
Epoch 2/20
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2858 - accuracy: 0.8948 - val_loss: 0.2820 - val_accuracy: 0.8973
Epoch 3/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2345 - accuracy: 0.9150 - val_loss: 0.2732 - val_accuracy: 0.9001
Epoch 4/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2016 - accuracy: 0.9247 - val_loss: 0.2549 - val_accuracy: 0.9077
Epoch 5/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1644 - accuracy: 0.9393 - val_loss: 0.2570 - val_accuracy: 0.9077
Epoch 6/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1434 - accuracy: 0.9466 - val_loss: 0.2652 - val_accuracy: 0.9119
Epoch 7/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1225 - accuracy: 0.9553 - val_loss: 0.2638 - val_accuracy: 0.9135
As you can see, each epoch was trained on 1875 images and not on the entire 60K images, why is that? or am I reading the log the wrong way?
It is because the number shown there is number of steps instead of number of examples trained. As you didn't supply batch_size into model.fit(), it used the default batch size 32.
The expected number of steps per epoch is ceil(60000/32) = 1875, consistent with what is shown in the log.

TensorFlow accuracy metrics

The following is a very simple TensorFlow 2 image classification model.
Note that the loss function is not the usual SparseCategoricalCrossentropy. Also, the last layer has only 1 output, so this is not the usual classification setting. The accuracy here does not have meaning, but I am just curious.
So this code does not work well as we expected, but still produces outputs with an accuracy of around 10%, which seems reasonable.
My question is how this accuracy is calculated? The prediction from this model is a continuous value and the y_true is an integer value. It is not impossible to have an x.0 for the prediction, then the accuracy is too high.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
===
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8237 - accuracy: 0.0922
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8266 - accuracy: 0.0931
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8335 - accuracy: 0.0921
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8109 - accuracy: 0.0931
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8210 - accuracy: 0.0926
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8067 - accuracy: 0.0921
Epoch 7/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8028 - accuracy: 0.0925
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8070 - accuracy: 0.0929
Epoch 9/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.7879 - accuracy: 0.0925
Epoch 10/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8055 - accuracy: 0.0914
<tensorflow.python.keras.callbacks.History at 0x7f65db17df10>
So, I have searched the TensorFlow API document to find the following example. And it makes sense.
m = tf.keras.metrics.Accuracy()
m.update_state([[1], [2], [3], [4]], [[0], [2], [3], [4]])
m.result().numpy()
===
0.75
So I have tried the following code and get the 0.0 accuracy.
m = tf.keras.metrics.Accuracy()
m.update_state(model.predict(x_train), y_train)
m.result().numpy()
===
0.0
Is there any explanation for this?
look at the shapes of y_true and y_pred metrices, because their shape is [n,1] not the [n] | (where n is number of instances).
so, you have to expand the shape over the -1 axis.

Why the accuracy and validation_accuracy are drastically different on the same dataset (no normalization or dropout)?

I'm new to tensorflow 2.0 and I'm running a very simple model that classifies a 1d time series of fixed size (100 values) into one of two classes:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512, activation='relu', input_shape=(100, 1)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
I have a dataset of ~660,000 labeled examples that I feed into the model with batch_size=256. When I train the NN for 10 epochs, using the same data as a validation dataset
history = model.fit(training_dataset,
epochs=10,
verbose=1,
validation_data=training_dataset)
I got the following output
Epoch 1/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5271 - acc: 0.7433 - val_loss: 3.4160 - val_acc: 0.4282
Epoch 2/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5673 - acc: 0.7318 - val_loss: 3.3634 - val_acc: 0.4282
Epoch 3/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5628 - acc: 0.7348 - val_loss: 2.6422 - val_acc: 0.4282
Epoch 4/10
2573/2573 [==============================] - 57s 22ms/step - loss: 0.5589 - acc: 0.7314 - val_loss: 2.6799 - val_acc: 0.4282
Epoch 5/10
2573/2573 [==============================] - 56s 22ms/step - loss: 0.5683 - acc: 0.7278 - val_loss: 2.3266 - val_acc: 0.4282
Epoch 6/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5644 - acc: 0.7276 - val_loss: 2.3177 - val_acc: 0.4282
Epoch 7/10
2573/2573 [==============================] - 56s 22ms/step - loss: 0.5664 - acc: 0.7255 - val_loss: 2.3848 - val_acc: 0.4282
Epoch 8/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5711 - acc: 0.7237 - val_loss: 2.2369 - val_acc: 0.4282
Epoch 9/10
2573/2573 [==============================] - 55s 22ms/step - loss: 0.5739 - acc: 0.7189 - val_loss: 2.6969 - val_acc: 0.4282
Epoch 10/10
2573/2573 [==============================] - 219s 85ms/step - loss: 0.5778 - acc: 0.7213 - val_loss: 2.5662 - val_acc: 0.4282
How come the accuracy during the training is so different from the validation step, when run on the same dataset? I tried to find some explanation but it seems that such problems usually arise when people use BatchNormalization or Dropout layers, which is not the case here.
Based on the information above, I may assume your data has strong dependencies on examples that are closer to each other in time series.
Therefore, NN data flow will likely be like this:
NN takes the first batch, calculates the loss, and updates the weights and biases
the cycle repeats on and on
but since examples in batches not that far away from each other in time series it is easier for NN to update weights accordingly, making loss reasonably low for every next batch
When it is time for validation NN just calculates the loss without updating the weights,
so you end up with NN that learned how to infer on a small portion of the data but do not generalize well on the whole dataset.
That is why validation error is different from training even on the same dataset.
And a list of reasons is not limited to this, this is just one assumption.

Keras - training loss vs validation loss

Just for the sake of the argument I am using the same data during training for training and validation, like this:
model.fit_generator(
generator=train_generator,
epochs=EPOCHS,
steps_per_epoch=train_generator.n // BATCH_SIZE,
validation_data=train_generator,
validation_steps=train_generator.n // BATCH_SIZE
)
So I would expect that the loss and the accuracy of training and validation at the end of each epoch would be pretty much the same? Still it looks like this:
Epoch 1/150
26/26 [==============================] - 55s 2s/step - loss: 1.5520 - acc: 0.3171 - val_loss: 1.6646 - val_acc: 0.2796
Epoch 2/150
26/26 [==============================] - 46s 2s/step - loss: 1.2924 - acc: 0.4996 - val_loss: 1.5895 - val_acc: 0.3508
Epoch 3/150
26/26 [==============================] - 46s 2s/step - loss: 1.1624 - acc: 0.5873 - val_loss: 1.6197 - val_acc: 0.3262
Epoch 4/150
26/26 [==============================] - 46s 2s/step - loss: 1.0601 - acc: 0.6265 - val_loss: 1.9420 - val_acc: 0.3150
Epoch 5/150
26/26 [==============================] - 46s 2s/step - loss: 0.9790 - acc: 0.6640 - val_loss: 1.9667 - val_acc: 0.2823
Epoch 6/150
26/26 [==============================] - 46s 2s/step - loss: 0.9191 - acc: 0.6951 - val_loss: 1.8594 - val_acc: 0.3342
Epoch 7/150
26/26 [==============================] - 46s 2s/step - loss: 0.8811 - acc: 0.7087 - val_loss: 2.3223 - val_acc: 0.2869
Epoch 8/150
26/26 [==============================] - 46s 2s/step - loss: 0.8148 - acc: 0.7379 - val_loss: 1.9683 - val_acc: 0.3358
Epoch 9/150
26/26 [==============================] - 46s 2s/step - loss: 0.8068 - acc: 0.7307 - val_loss: 2.1053 - val_acc: 0.3312
Why does especially the accuracy differ so much although its from the same data source? Is there something about the way how this is calculated that I am missing?
The generator is created like this:
train_images = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255
)
train_generator = train_images.flow_from_directory(
directory="data/superheros/images/train",
target_size=(299, 299),
batch_size=BATCH_SIZE,
shuffle=True
)
Yes, it shuffles the images, but as it iterates over all images also for validation, shouldn't the accuracy at least be close?
So the model looks like this:
inceptionV3 = keras.applications.inception_v3.InceptionV3(include_top=False)
features = inceptionV3.output
net = keras.layers.GlobalAveragePooling2D()(features)
predictions = keras.layers.Dense(units=2, activation="softmax")(net)
for layer in inceptionV3.layers:
layer.trainable = False
model = keras.Model(inputs=inceptionV3.input, outputs=predictions)
optimizer = keras.optimizers.RMSprop()
model.compile(
optimizer=optimizer,
loss="categorical_crossentropy",
metrics=['accuracy']
)
So no dropout or anything, just the inceptionv3 with a softmax layer on top. I would expect that the accuracy differs a bit, but not in this magnitude.
Are you sure your train_generator returns the same data when Keras retrieves training data and validation data, if it's a generator?
The name being generator, I'd expect it not to :)