why keras wont train on my entire image set - tensorflow

I am training a convolutional model in keras.
The size of my training data is
(60000, 28, 28, 1)
I build a simple model here
model = Sequential()
model.add(
Conv2D(
filters=32,
kernel_size=(4, 4),
input_shape=(28, 28, 1),
activation='relu'
)
)
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
When I tried to fit this model to my data in the following line
model.fit(
x_train,
y_train_hot_encode,
epochs=20,
validation_data=(x_test, y_test_hot_encode),
)
I noticed something weird in the logs
Epoch 1/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.5311 - accuracy: 0.8109 - val_loss: 0.3381 - val_accuracy: 0.8780
Epoch 2/20
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2858 - accuracy: 0.8948 - val_loss: 0.2820 - val_accuracy: 0.8973
Epoch 3/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2345 - accuracy: 0.9150 - val_loss: 0.2732 - val_accuracy: 0.9001
Epoch 4/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2016 - accuracy: 0.9247 - val_loss: 0.2549 - val_accuracy: 0.9077
Epoch 5/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1644 - accuracy: 0.9393 - val_loss: 0.2570 - val_accuracy: 0.9077
Epoch 6/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1434 - accuracy: 0.9466 - val_loss: 0.2652 - val_accuracy: 0.9119
Epoch 7/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1225 - accuracy: 0.9553 - val_loss: 0.2638 - val_accuracy: 0.9135
As you can see, each epoch was trained on 1875 images and not on the entire 60K images, why is that? or am I reading the log the wrong way?

It is because the number shown there is number of steps instead of number of examples trained. As you didn't supply batch_size into model.fit(), it used the default batch size 32.
The expected number of steps per epoch is ceil(60000/32) = 1875, consistent with what is shown in the log.

Related

Why is the val_acc of my model 4 classification always around 25%? My train_acc is increasing

I train my model in Colab. I want to use mobilenet_v2 model to classify picture to 4 categories, but the accuracy is always about 25%.
And the train_acc is increasing, but the val_acc is not.
Part of my codes as follows:
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(input_shape=(224,224,3), weights=None, classes=4)
from tensorflow.keras.optimizers import RMSprop
base_model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'GarbageForMaixHub',
target_size=(224,224),
batch_size=16,
class_mode='categorical'
)
base_model.fit(
train_generator,
steps_per_epoch=64,
epochs=5,
verbose=1,
validation_data=train_generator, # same to train data
validation_steps=16
)
And part of the outputs as follows:
Epoch 1/5
64/64 [==============================] - 15s 226ms/step - loss: 1.1683 - acc: 0.4434 - val_loss: 1.4742 - val_acc: 0.2070
Epoch 2/5
64/64 [==============================] - 15s 228ms/step - loss: 1.2155 - acc: 0.4258 - val_loss: 1.4351 - val_acc: 0.2852
Epoch 3/5
64/64 [==============================] - 15s 228ms/step - loss: 1.1015 - acc: 0.5029 - val_loss: 1.4660 - val_acc: 0.2617
Epoch 4/5
64/64 [==============================] - 14s 223ms/step - loss: 1.0972 - acc: 0.5088 - val_loss: 1.5096 - val_acc: 0.2852
Epoch 5/5
64/64 [==============================] - 14s 222ms/step - loss: 1.0932 - acc: 0.5195 - val_loss: 1.6133 - val_acc: 0.2266
<keras.callbacks.History at 0x7ff01f7eeeb0>
The val_acc is so different from train_acc when I used my train_generator as my validation_generator.
What could I do to solve, or to avoid such problem?
I feel a little clueless. I tried some diffirent model such as tf.keras.applications.EfficientNetB0, but it's no different. I have check the picture and labels, no problems found.

TensorFlow accuracy metrics

The following is a very simple TensorFlow 2 image classification model.
Note that the loss function is not the usual SparseCategoricalCrossentropy. Also, the last layer has only 1 output, so this is not the usual classification setting. The accuracy here does not have meaning, but I am just curious.
So this code does not work well as we expected, but still produces outputs with an accuracy of around 10%, which seems reasonable.
My question is how this accuracy is calculated? The prediction from this model is a continuous value and the y_true is an integer value. It is not impossible to have an x.0 for the prediction, then the accuracy is too high.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
===
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8237 - accuracy: 0.0922
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8266 - accuracy: 0.0931
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8335 - accuracy: 0.0921
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8109 - accuracy: 0.0931
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8210 - accuracy: 0.0926
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8067 - accuracy: 0.0921
Epoch 7/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8028 - accuracy: 0.0925
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8070 - accuracy: 0.0929
Epoch 9/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.7879 - accuracy: 0.0925
Epoch 10/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8055 - accuracy: 0.0914
<tensorflow.python.keras.callbacks.History at 0x7f65db17df10>
So, I have searched the TensorFlow API document to find the following example. And it makes sense.
m = tf.keras.metrics.Accuracy()
m.update_state([[1], [2], [3], [4]], [[0], [2], [3], [4]])
m.result().numpy()
===
0.75
So I have tried the following code and get the 0.0 accuracy.
m = tf.keras.metrics.Accuracy()
m.update_state(model.predict(x_train), y_train)
m.result().numpy()
===
0.0
Is there any explanation for this?
look at the shapes of y_true and y_pred metrices, because their shape is [n,1] not the [n] | (where n is number of instances).
so, you have to expand the shape over the -1 axis.

Why the accuracy and validation_accuracy are drastically different on the same dataset (no normalization or dropout)?

I'm new to tensorflow 2.0 and I'm running a very simple model that classifies a 1d time series of fixed size (100 values) into one of two classes:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512, activation='relu', input_shape=(100, 1)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
I have a dataset of ~660,000 labeled examples that I feed into the model with batch_size=256. When I train the NN for 10 epochs, using the same data as a validation dataset
history = model.fit(training_dataset,
epochs=10,
verbose=1,
validation_data=training_dataset)
I got the following output
Epoch 1/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5271 - acc: 0.7433 - val_loss: 3.4160 - val_acc: 0.4282
Epoch 2/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5673 - acc: 0.7318 - val_loss: 3.3634 - val_acc: 0.4282
Epoch 3/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5628 - acc: 0.7348 - val_loss: 2.6422 - val_acc: 0.4282
Epoch 4/10
2573/2573 [==============================] - 57s 22ms/step - loss: 0.5589 - acc: 0.7314 - val_loss: 2.6799 - val_acc: 0.4282
Epoch 5/10
2573/2573 [==============================] - 56s 22ms/step - loss: 0.5683 - acc: 0.7278 - val_loss: 2.3266 - val_acc: 0.4282
Epoch 6/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5644 - acc: 0.7276 - val_loss: 2.3177 - val_acc: 0.4282
Epoch 7/10
2573/2573 [==============================] - 56s 22ms/step - loss: 0.5664 - acc: 0.7255 - val_loss: 2.3848 - val_acc: 0.4282
Epoch 8/10
2573/2573 [==============================] - 55s 21ms/step - loss: 0.5711 - acc: 0.7237 - val_loss: 2.2369 - val_acc: 0.4282
Epoch 9/10
2573/2573 [==============================] - 55s 22ms/step - loss: 0.5739 - acc: 0.7189 - val_loss: 2.6969 - val_acc: 0.4282
Epoch 10/10
2573/2573 [==============================] - 219s 85ms/step - loss: 0.5778 - acc: 0.7213 - val_loss: 2.5662 - val_acc: 0.4282
How come the accuracy during the training is so different from the validation step, when run on the same dataset? I tried to find some explanation but it seems that such problems usually arise when people use BatchNormalization or Dropout layers, which is not the case here.
Based on the information above, I may assume your data has strong dependencies on examples that are closer to each other in time series.
Therefore, NN data flow will likely be like this:
NN takes the first batch, calculates the loss, and updates the weights and biases
the cycle repeats on and on
but since examples in batches not that far away from each other in time series it is easier for NN to update weights accordingly, making loss reasonably low for every next batch
When it is time for validation NN just calculates the loss without updating the weights,
so you end up with NN that learned how to infer on a small portion of the data but do not generalize well on the whole dataset.
That is why validation error is different from training even on the same dataset.
And a list of reasons is not limited to this, this is just one assumption.

Tensorflow 2: Customized Loss Function works differently from the original Keras SparseCategoricalCrossentropy

I just started to work with tensorflow 2.0 and followed the simple example from its official website.
import tensorflow as tf
import tensorflow.keras.layers as layers
mnist = tf.keras.datasets.mnist
(t_x, t_y), (v_x, v_y) = mnist.load_data()
model = tf.keras.Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=lossFunc,
metrics=['accuracy'])
model.fit(t_x, t_y, epochs=5)
The output for the above code is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.5368 - accuracy: 0.7455
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5846 - accuracy: 0.8446
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4751 - accuracy: 0.8757
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4112 - accuracy: 0.8915
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3732 - accuracy: 0.9018
However, if I change the lossFunc to the following:
def myfunc(y_true, y_pred):
return lossFunc(y_true, y_pred)
which just simply wrap the previous function, it performs totally differently. The output is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.4444 - accuracy: 0.0889
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5696 - accuracy: 0.0933
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4493 - accuracy: 0.0947
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4046 - accuracy: 0.0947
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3805 - accuracy: 0.0943
The loss values are very similar but the accuracy values are totally different. Anyone knows what is the magic in it, and what is the correct way to write your own loss function?
When you use built-in loss function, you can use 'accuracy' as metric . Under the hood, tensorflow will select appropriate accuracy function (in your case it is tf.keras.metrics.SparseCategoricalAccuracy()).
When you define custom_loss function, then tensorflow doesn't know which accuracy function to use. In this case, you need to explicitly specify that it is tf.keras.metrics.SparseCategoricalAccuracy(). Please check the gist hub gist here.
The code modification and the output is as follows
model2 = tf.keras.Sequential()
model2.add(layers.Flatten())
model2.add(layers.Dense(128, activation="relu"))
model2.add(layers.Dropout(0.2))
model2.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model2.compile(optimizer='adam', loss=myfunc,
metrics=['accuracy',tf.keras.metrics.SparseCategoricalAccuracy()])
model2.fit(t_x, t_y, epochs=5)
output
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 81us/sample - loss: 2.2295 - accuracy: 0.0917 - sparse_categorical_accuracy: 0.7483
Epoch 2/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.5827 - accuracy: 0.0922 - sparse_categorical_accuracy: 0.8450
Epoch 3/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4602 - accuracy: 0.0933 - sparse_categorical_accuracy: 0.8760
Epoch 4/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4197 - accuracy: 0.0946 - sparse_categorical_accuracy: 0.8910
Epoch 5/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.3965 - accuracy: 0.0937 - sparse_categorical_accuracy: 0.8979
<tensorflow.python.keras.callbacks.History at 0x7f5095286780>
Hope this helps

Keras - training loss vs validation loss

Just for the sake of the argument I am using the same data during training for training and validation, like this:
model.fit_generator(
generator=train_generator,
epochs=EPOCHS,
steps_per_epoch=train_generator.n // BATCH_SIZE,
validation_data=train_generator,
validation_steps=train_generator.n // BATCH_SIZE
)
So I would expect that the loss and the accuracy of training and validation at the end of each epoch would be pretty much the same? Still it looks like this:
Epoch 1/150
26/26 [==============================] - 55s 2s/step - loss: 1.5520 - acc: 0.3171 - val_loss: 1.6646 - val_acc: 0.2796
Epoch 2/150
26/26 [==============================] - 46s 2s/step - loss: 1.2924 - acc: 0.4996 - val_loss: 1.5895 - val_acc: 0.3508
Epoch 3/150
26/26 [==============================] - 46s 2s/step - loss: 1.1624 - acc: 0.5873 - val_loss: 1.6197 - val_acc: 0.3262
Epoch 4/150
26/26 [==============================] - 46s 2s/step - loss: 1.0601 - acc: 0.6265 - val_loss: 1.9420 - val_acc: 0.3150
Epoch 5/150
26/26 [==============================] - 46s 2s/step - loss: 0.9790 - acc: 0.6640 - val_loss: 1.9667 - val_acc: 0.2823
Epoch 6/150
26/26 [==============================] - 46s 2s/step - loss: 0.9191 - acc: 0.6951 - val_loss: 1.8594 - val_acc: 0.3342
Epoch 7/150
26/26 [==============================] - 46s 2s/step - loss: 0.8811 - acc: 0.7087 - val_loss: 2.3223 - val_acc: 0.2869
Epoch 8/150
26/26 [==============================] - 46s 2s/step - loss: 0.8148 - acc: 0.7379 - val_loss: 1.9683 - val_acc: 0.3358
Epoch 9/150
26/26 [==============================] - 46s 2s/step - loss: 0.8068 - acc: 0.7307 - val_loss: 2.1053 - val_acc: 0.3312
Why does especially the accuracy differ so much although its from the same data source? Is there something about the way how this is calculated that I am missing?
The generator is created like this:
train_images = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255
)
train_generator = train_images.flow_from_directory(
directory="data/superheros/images/train",
target_size=(299, 299),
batch_size=BATCH_SIZE,
shuffle=True
)
Yes, it shuffles the images, but as it iterates over all images also for validation, shouldn't the accuracy at least be close?
So the model looks like this:
inceptionV3 = keras.applications.inception_v3.InceptionV3(include_top=False)
features = inceptionV3.output
net = keras.layers.GlobalAveragePooling2D()(features)
predictions = keras.layers.Dense(units=2, activation="softmax")(net)
for layer in inceptionV3.layers:
layer.trainable = False
model = keras.Model(inputs=inceptionV3.input, outputs=predictions)
optimizer = keras.optimizers.RMSprop()
model.compile(
optimizer=optimizer,
loss="categorical_crossentropy",
metrics=['accuracy']
)
So no dropout or anything, just the inceptionv3 with a softmax layer on top. I would expect that the accuracy differs a bit, but not in this magnitude.
Are you sure your train_generator returns the same data when Keras retrieves training data and validation data, if it's a generator?
The name being generator, I'd expect it not to :)