Questions about the number of learning [duplicate] - tensorflow

So I've been following Google's official tensorflow guide and trying to build a simple neural network using Keras. But when it comes to training the model, it does not use the entire dataset (with 60000 entries) and instead uses only 1875 entries for training. Any possible fix?
import tensorflow as tf
from tensorflow import keras
import numpy as np
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss= tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
Output:
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3183 - accuracy: 0.8866
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3169 - accuracy: 0.8873
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3144 - accuracy: 0.8885
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3130 - accuracy: 0.8885
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3110 - accuracy: 0.8883
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3090 - accuracy: 0.8888
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3073 - accuracy: 0.8895
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3057 - accuracy: 0.8900
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3040 - accuracy: 0.8905
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3025 - accuracy: 0.8915
<tensorflow.python.keras.callbacks.History at 0x7fbe0e5aebe0>
Here's the original google colab notebook where I've been working on this: https://colab.research.google.com/drive/1NdtzXHEpiNnelcMaJeEm6zmp34JMcN38

The number 1875 shown during fitting the model is not the training samples; it is the number of batches.
model.fit includes an optional argument batch_size, which, according to the documentation:
If unspecified, batch_size will default to 32.
So, what happens here is - you fit with the default batch size of 32 (since you have not specified anything different), so the total number of batches for your data is
60000/32 = 1875

It does not train on 1875 samples.
Epoch 1/10
1875/1875 [===
1875 here is the number of steps, not samples. In fit method, there is an argument, batch_size. The default value for it is 32. So 1875*32=60000. The implementation is correct.
If you train it with batch_size=16, you will see the number of steps will be 3750 instead of 1875, since 60000/16=3750.

Just use batch_size = 1, if you want the entire 60000 data samples to be visible.

Related

Why is the val_acc of my model 4 classification always around 25%? My train_acc is increasing

I train my model in Colab. I want to use mobilenet_v2 model to classify picture to 4 categories, but the accuracy is always about 25%.
And the train_acc is increasing, but the val_acc is not.
Part of my codes as follows:
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(input_shape=(224,224,3), weights=None, classes=4)
from tensorflow.keras.optimizers import RMSprop
base_model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'GarbageForMaixHub',
target_size=(224,224),
batch_size=16,
class_mode='categorical'
)
base_model.fit(
train_generator,
steps_per_epoch=64,
epochs=5,
verbose=1,
validation_data=train_generator, # same to train data
validation_steps=16
)
And part of the outputs as follows:
Epoch 1/5
64/64 [==============================] - 15s 226ms/step - loss: 1.1683 - acc: 0.4434 - val_loss: 1.4742 - val_acc: 0.2070
Epoch 2/5
64/64 [==============================] - 15s 228ms/step - loss: 1.2155 - acc: 0.4258 - val_loss: 1.4351 - val_acc: 0.2852
Epoch 3/5
64/64 [==============================] - 15s 228ms/step - loss: 1.1015 - acc: 0.5029 - val_loss: 1.4660 - val_acc: 0.2617
Epoch 4/5
64/64 [==============================] - 14s 223ms/step - loss: 1.0972 - acc: 0.5088 - val_loss: 1.5096 - val_acc: 0.2852
Epoch 5/5
64/64 [==============================] - 14s 222ms/step - loss: 1.0932 - acc: 0.5195 - val_loss: 1.6133 - val_acc: 0.2266
<keras.callbacks.History at 0x7ff01f7eeeb0>
The val_acc is so different from train_acc when I used my train_generator as my validation_generator.
What could I do to solve, or to avoid such problem?
I feel a little clueless. I tried some diffirent model such as tf.keras.applications.EfficientNetB0, but it's no different. I have check the picture and labels, no problems found.

why keras wont train on my entire image set

I am training a convolutional model in keras.
The size of my training data is
(60000, 28, 28, 1)
I build a simple model here
model = Sequential()
model.add(
Conv2D(
filters=32,
kernel_size=(4, 4),
input_shape=(28, 28, 1),
activation='relu'
)
)
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
When I tried to fit this model to my data in the following line
model.fit(
x_train,
y_train_hot_encode,
epochs=20,
validation_data=(x_test, y_test_hot_encode),
)
I noticed something weird in the logs
Epoch 1/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.5311 - accuracy: 0.8109 - val_loss: 0.3381 - val_accuracy: 0.8780
Epoch 2/20
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2858 - accuracy: 0.8948 - val_loss: 0.2820 - val_accuracy: 0.8973
Epoch 3/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2345 - accuracy: 0.9150 - val_loss: 0.2732 - val_accuracy: 0.9001
Epoch 4/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2016 - accuracy: 0.9247 - val_loss: 0.2549 - val_accuracy: 0.9077
Epoch 5/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1644 - accuracy: 0.9393 - val_loss: 0.2570 - val_accuracy: 0.9077
Epoch 6/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1434 - accuracy: 0.9466 - val_loss: 0.2652 - val_accuracy: 0.9119
Epoch 7/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1225 - accuracy: 0.9553 - val_loss: 0.2638 - val_accuracy: 0.9135
As you can see, each epoch was trained on 1875 images and not on the entire 60K images, why is that? or am I reading the log the wrong way?
It is because the number shown there is number of steps instead of number of examples trained. As you didn't supply batch_size into model.fit(), it used the default batch size 32.
The expected number of steps per epoch is ceil(60000/32) = 1875, consistent with what is shown in the log.

TensorFlow accuracy metrics

The following is a very simple TensorFlow 2 image classification model.
Note that the loss function is not the usual SparseCategoricalCrossentropy. Also, the last layer has only 1 output, so this is not the usual classification setting. The accuracy here does not have meaning, but I am just curious.
So this code does not work well as we expected, but still produces outputs with an accuracy of around 10%, which seems reasonable.
My question is how this accuracy is calculated? The prediction from this model is a continuous value and the y_true is an integer value. It is not impossible to have an x.0 for the prediction, then the accuracy is too high.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
===
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8237 - accuracy: 0.0922
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8266 - accuracy: 0.0931
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8335 - accuracy: 0.0921
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8109 - accuracy: 0.0931
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8210 - accuracy: 0.0926
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8067 - accuracy: 0.0921
Epoch 7/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8028 - accuracy: 0.0925
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8070 - accuracy: 0.0929
Epoch 9/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.7879 - accuracy: 0.0925
Epoch 10/10
1875/1875 [==============================] - 3s 1ms/step - loss: 1.8055 - accuracy: 0.0914
<tensorflow.python.keras.callbacks.History at 0x7f65db17df10>
So, I have searched the TensorFlow API document to find the following example. And it makes sense.
m = tf.keras.metrics.Accuracy()
m.update_state([[1], [2], [3], [4]], [[0], [2], [3], [4]])
m.result().numpy()
===
0.75
So I have tried the following code and get the 0.0 accuracy.
m = tf.keras.metrics.Accuracy()
m.update_state(model.predict(x_train), y_train)
m.result().numpy()
===
0.0
Is there any explanation for this?
look at the shapes of y_true and y_pred metrices, because their shape is [n,1] not the [n] | (where n is number of instances).
so, you have to expand the shape over the -1 axis.

Tensorflow 2: Customized Loss Function works differently from the original Keras SparseCategoricalCrossentropy

I just started to work with tensorflow 2.0 and followed the simple example from its official website.
import tensorflow as tf
import tensorflow.keras.layers as layers
mnist = tf.keras.datasets.mnist
(t_x, t_y), (v_x, v_y) = mnist.load_data()
model = tf.keras.Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=lossFunc,
metrics=['accuracy'])
model.fit(t_x, t_y, epochs=5)
The output for the above code is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.5368 - accuracy: 0.7455
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5846 - accuracy: 0.8446
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4751 - accuracy: 0.8757
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4112 - accuracy: 0.8915
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3732 - accuracy: 0.9018
However, if I change the lossFunc to the following:
def myfunc(y_true, y_pred):
return lossFunc(y_true, y_pred)
which just simply wrap the previous function, it performs totally differently. The output is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.4444 - accuracy: 0.0889
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5696 - accuracy: 0.0933
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4493 - accuracy: 0.0947
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4046 - accuracy: 0.0947
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3805 - accuracy: 0.0943
The loss values are very similar but the accuracy values are totally different. Anyone knows what is the magic in it, and what is the correct way to write your own loss function?
When you use built-in loss function, you can use 'accuracy' as metric . Under the hood, tensorflow will select appropriate accuracy function (in your case it is tf.keras.metrics.SparseCategoricalAccuracy()).
When you define custom_loss function, then tensorflow doesn't know which accuracy function to use. In this case, you need to explicitly specify that it is tf.keras.metrics.SparseCategoricalAccuracy(). Please check the gist hub gist here.
The code modification and the output is as follows
model2 = tf.keras.Sequential()
model2.add(layers.Flatten())
model2.add(layers.Dense(128, activation="relu"))
model2.add(layers.Dropout(0.2))
model2.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model2.compile(optimizer='adam', loss=myfunc,
metrics=['accuracy',tf.keras.metrics.SparseCategoricalAccuracy()])
model2.fit(t_x, t_y, epochs=5)
output
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 81us/sample - loss: 2.2295 - accuracy: 0.0917 - sparse_categorical_accuracy: 0.7483
Epoch 2/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.5827 - accuracy: 0.0922 - sparse_categorical_accuracy: 0.8450
Epoch 3/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4602 - accuracy: 0.0933 - sparse_categorical_accuracy: 0.8760
Epoch 4/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4197 - accuracy: 0.0946 - sparse_categorical_accuracy: 0.8910
Epoch 5/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.3965 - accuracy: 0.0937 - sparse_categorical_accuracy: 0.8979
<tensorflow.python.keras.callbacks.History at 0x7f5095286780>
Hope this helps

Why training accuracy and validation accuracy are different for the same dataset with tensorflow2.0?

I am training with tensorflow2.0 and tensorflow_datasets. But I am not understand: why does the training accuracy and loss and valdataion accuracy and loss are different?
This is my code:
import tensorflow as tf
import tensorflow_datasets as tfds
data_name = 'uc_merced'
dataset = tfds.load(data_name)
# the train_data and the test_data are same dataset
train_data, test_data = dataset['train'], dataset['train']
def parse(img_dict):
img = tf.image.resize_with_pad(img_dict['image'], 256, 256)
#img = img / 255.
label = img_dict['label']
return img, label
train_data = train_data.map(parse)
train_data = train_data.batch(96)
test_data = test_data.map(parse)
test_data = test_data.batch(96)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.applications.ResNet50(weights=None, classes=21,
input_shape=(256, 256, 3))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_data, epochs=50, verbose=2, validation_data=test_data)
It is very simple and you can run it on your computer. you can see my train data and validation data are the same train_data, test_data = dataset['train'], dataset['train'].
But the train accuracy (loss) are not the same with validation accuracy (loss). Why is it happen? Is this the bug of tensorflow2.0?
Epoch 1/50
22/22 - 51s - loss: 3.3766 - accuracy: 0.2581 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/50
22/22 - 30s - loss: 1.8221 - accuracy: 0.4590 - val_loss: 123071.9851 - val_accuracy: 0.0476
Epoch 3/50
22/22 - 30s - loss: 1.4701 - accuracy: 0.5405 - val_loss: 12767.8928 - val_accuracy: 0.0519
Epoch 4/50
22/22 - 30s - loss: 1.2113 - accuracy: 0.6071 - val_loss: 3.9311 - val_accuracy: 0.1186
Epoch 5/50
22/22 - 31s - loss: 1.0846 - accuracy: 0.6567 - val_loss: 23.7775 - val_accuracy: 0.1386
Epoch 6/50
22/22 - 31s - loss: 0.9358 - accuracy: 0.7043 - val_loss: 15.3453 - val_accuracy: 0.1543
Epoch 7/50
22/22 - 32s - loss: 0.8566 - accuracy: 0.7243 - val_loss: 8.0415 - val_accuracy: 0.2548
In short, the culprit here is BatchNorm.
Since you have a small dataset and large batch size, you only do 22 updates per epoch. The BatchNorm layer has a default momentum of 0.99, so it takes some time to move the BatchNorm running means/variances to values more appropriate for your dataset (which, given you do not normalise the pixel values away from the [0, 255] range, is pretty far from the typical mean=0, variance=1 sort of range that neural networks are generally designed/initialised to expect).
The reason for the big discrepancy in train vs. validation loss/accuracy is because the training behaviour of batch norm versus the testing behaviour is very different, especially with so few batches. The mean of the data running through the network during training is very far from the running mean accumulated so far, which only updates slowly due to the default BatchNorm momentum/decay of 0.99.
If you reduce your batch size from 96 to, say, 4, you substantially increase the frequency of updates to the BatchNorm running means/variances. Doing this, plus uncommenting the #img = img / 255. line in your data parsing function, alleviates the train/validation discrepancy to a large extent. Doing so gives me this output for three epochs:
Epoch 1/7
525/525 - 51s - loss: 3.2650 - accuracy: 0.1633 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/7
525/525 - 38s - loss: 2.6455 - accuracy: 0.2152 - val_loss: 12.1067 - val_accuracy: 0.2114
Epoch 3/7
525/525 - 38s - loss: 2.5033 - accuracy: 0.2414 - val_loss: 16.9369 - val_accuracy: 0.2095
You can also keep your code the same, and instead modify the keras_applications implementation of Resnet50 to use BatchNormalization(..., momentum=0.9) everywhere. This gives me the following output after two epochs, which I think more or less shows that indeed this is the main cause of your issue:
Epoch 1/2
22/22 [==============================] - 33s 1s/step - loss: 3.1512 - accuracy: 0.2357 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/2
22/22 [==============================] - 16s 748ms/step - loss: 1.7975 - accuracy: 0.4505 - val_loss: 4.1324 - val_accuracy: 0.2810