I am trying to use TF 2.0 for MNIST dataset with batch size of 128.
During learning I get the following message in an EPOCH. If my batch size is 128, what does 375 signify?
Epoch 13/200
375/375 [==============================] - 2s 4ms/step - loss: 0.2351 - accuracy: 0.9330 - val_loss: 0.2258 - val_accuracy: 0.9370
import tensorflow as tf
import numpy as np
from tensorflow import keras
# Network and training parameters.
EPOCHS = 200
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # number of outputs = number of digits
N_HIDDEN = 128
VALIDATION_SPLIT = 0.2 # how much TRAIN is reserved for VALIDATION
# Loading MNIST dataset.
# verify
# You can verify that the split between train and test is 60,000, and 10,000 respectively.
# Labels have one-hot representation.is automatically applied
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# X_train is 60000 rows of 28x28 values; we --> reshape it to
# 60000 x 784.
RESHAPED = 784
#
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize inputs to be within in [0, 1].
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# One-hot representation of the labels.
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN,
input_shape=(RESHAPED,),
name='dense_layer', activation='relu'))
model.add(keras.layers.Dense(N_HIDDEN,
name='dense_layer_2', activation='relu'))
model.add(keras.layers.Dense(NB_CLASSES,
name='dense_layer_3', activation='softmax'))
# Compiling the model.
model.compile(optimizer='SGD',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Training the model.
model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=EPOCHS,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
The 375 is the number of steps per epoch your model must complete in order to finish one epoch. (Every epoch your model will process steps_per_epoch number of batches)
steps_per_epoch = len(X_train) // batch_size
but for your case, since you have specified 0.20 value to the validation_split parameter to model.fit()
steps_per_epoch = (len(X_train)*(1-VALIDATION_SPLIT)) // batch_size
so plugging the values,
steps = ((60,000) * (1-0.20)) // 128 = 375
Related
I'm trying to work with a image classification model for gravity waves detection.
So I want to check if there is something I could do to lower validation loss %, or more importantly, increase validation accuracy %.
The dataset is about a total of 460 images, split into
300 images that belong to 2 classes
60 images belonging to 2 classes
100 images belonging to 2 classes
For context, this is the code for pre processing:
batch_size = 32
Data Augmentation:
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
horizontal_flip=True,
)
test_datagen = ImageDataGenerator(
rescale=1./255)
The generator that reads images to generate batches of augmented image data
train_generator = train_datagen.flow_from_directory(
'./train', target directory
target_size=(256, 256), images resized to 150x150
batch_size=batch_size,
# batch_size=40,
class_mode='binary')
validation_generator = train_datagen.flow_from_directory(
'./validation',
target_size=(256, 256),
batch_size=batch_size,
# batch_size=20,
class_mode='binary')
test_generator = test_datagen.flow_from_directory(
'./test',
target_size=(256, 256),
# batch_size=batch_size,
batch_size=batch_size,
#class_mode=None,
class_mode= None,
shuffle=False)
And this is the model used:
from tensorflow.keras.applications.inception_v3 import InceptionV3
import tensorflow as tf
from keras import regularizers
base_model = InceptionV3(input_shape = (256,256,3), include_top = False, weights = 'imagenet')
x = layers.Flatten()(base_model.output)
x = layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = Model(base_model.input, x)
model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=0.001), loss = 'binary_crossentropy', metrics = ['accuracy'])
fitness = model.fit(
train_generator,
steps_per_epoch= 120,
epochs = 100,
validation_data=validation_generator,
validation_steps= 64)
So far the accuracy and loss % have been around:
Average training accuracy: 0.9237500047683715
Average training loss: 0.17309745135484264
Average validation accuracy: 0.6489999979734421
Average validation loss: 0.9121886402368545
The predicitons have been around:
validation predictions:
(24, 36)
test predictions:
(45, 55)
And the confusion matrix:
Confusion Matrix:
array([[12, 18],
[12, 18]])
I am trying the code from tensorflow "Writing a training loop from scratch" with some changes by myself. I changed the loss function from SparseCategoricalCrossentropy to MeanSquaredError. I also changed the architecture of the model by adding a new Lambda layer for loss calculation. However, I have the Value error that no gradients provided for variable. Is there any way that I can make the code to run with MSE?
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
final_outputs = layers.Lambda(lambda x: tf.math.argmax(x, axis = -1))(outputs)
model = keras.Model(inputs=inputs, outputs=final_outputs)
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.MeanSquaredError()
# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
argmax ops is not differentiable. To use an integer label and MSE loss, you want to convert your labels y_train and y_val to integers.
y_train = np.argmax(y_train, axis=-1)
y_val = np.argmax(y_val, axis=-1)
And adjust the output layer to output integer labels
outputs = layers.Dense(1, name="predictions")(x2)
I would like to know
(1) how often the call() method of tf.keras.losses.Loss
and the update_state() method of tf.keras.metrics.Metric gets called during a training:
are they called per each instance (observation)?
or called per each batch?
(2) the dimension of y_true and y_pred passed to those methods:
are their dimension (batch_size x output_dimension)
or (1 x output_dimension)
The following code snippet comes from
https://www.tensorflow.org/guide/keras/train_and_evaluate
For experiment I insert print(y_true.shape, y_pred.shape) in update_state() and I find that it is only printed once in the first epoch. From the print, it looks like y_true and y_pred have the dimension of
(1 x output_dimension) in this particular example but is it always the case?
So, additionally
(3) I would like to know why it is printed only once and only in the first epoch.
(4) I can't print the value of y_true or y_pred. How can I?
Epoch 1/3
(None, 1) (None, 10)
(None, 1) (None, 10)
782/782 [==============================] - 3s 4ms/step - loss: 0.5666 - categorical_true_positives: 22080.8940
Epoch 2/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1680 - categorical_true_positives: 23877.1162
Epoch 3/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1190 - categorical_true_positives: 24198.2733
<tensorflow.python.keras.callbacks.History at 0x1fb132cde80>
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name="categorical_true_positives", **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name="ctp", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
print(y_true.shape, y_pred.shape) # For experiment
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, "int32") == tf.cast(y_pred, "int32")
values = tf.cast(values, "float32")
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, "float32")
values = tf.multiply(values, sample_weight)
self.true_positives.assign_add(tf.reduce_sum(values))
def result(self):
return self.true_positives
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.true_positives.assign(0.0)
model.compile(
optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=[CategoricalTruePositives()],
)
model.fit(x_train, y_train, batch_size=64, epochs=3)
(1) how often the call() method of tf.keras.losses.Loss and the update_state() method of tf.keras.metrics.Metric gets called during a training:
The call method of tf.keras.losses.Loss and the update_state() are used at the end of each batch.
(2) the dimension of y_true and y_pred passed to those methods:
The dimensions of y_true is same as what you pass in y_train. The only change is, the first dimension of y_train will be no_of samples and in the case of y_true it will be batch_size. In your case it is (64, 1) where 64 is batch_size.
The dimensions of y_pred is the shape of output of the model. In your case it is (64, 10) because you have 10 dense units in final layer.
(3) I would like to know why it is printed only once and only in the first epoch.
The print statement is executed only once because tensorflow is executed in graph mode. Print will only work in eager mode. Add run_eagerly = True in model.compile step if you want to execute tensorflow code in eager mode.
(4) I can't print the value of y_true or y_pred. How can I?
Run the code in eager mode.
Code:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name="categorical_true_positives", **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name="ctp", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
print('update_state', y_true.shape, y_pred.shape) # For experiment
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, "int32") == tf.cast(y_pred, "int32")
values = tf.cast(values, "float32")
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, "float32")
values = tf.multiply(values, sample_weight)
self.true_positives.assign_add(tf.reduce_sum(values))
def result(self):
return self.true_positives
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.true_positives.assign(0.0)
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print("Start epoch {} of training".format(epoch))
def on_train_batch_begin(self, batch, logs=None):
keys = list(logs.keys())
print("...Training: start of batch {}".format(batch))
def on_train_batch_end(self, batch, logs=None):
print("...Training: end of batch {}".format(batch))
model.compile(
optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=[CategoricalTruePositives()],
run_eagerly = True,
)
model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 0, callbacks=[CustomCallback()])
Output:
Start epoch 0 of training
...Training: start of batch 0
update_state (64, 1) (64, 10)
...Training: end of batch 0
...Training: start of batch 1
update_state (64, 1) (64, 10)
...Training: end of batch 1
...Training: start of batch 2
update_state (64, 1) (64, 10)
...Training: end of batch 2
...Training: start of batch 3
update_state (64, 1) (64, 10)
...Training: end of batch 3
...Training: start of batch 4
update_state (64, 1) (64, 10)
...Training: end of batch 4
...Training: start of batch 5
update_state (64, 1) (64, 10)
...Training: end of batch 5
The above example will make the answer to your clear.
I tried the binary classification using MNIST only number “1” and “5”.
But the accuracy isn’t well.. The following program is anything wrong?
If you find something, please give me some advice.
loss: -9.9190e+04
accuracy: 0.5599
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
train_filter = np.where((y_train == 1) | (y_train == 5))
test_filter = np.where((y_test == 1) | (y_test == 5))
x_train, y_train = x_train[train_filter], y_train[train_filter]
x_test, y_test = x_test[test_filter], y_test[test_filter]
print("x_train", x_train.shape)
print("x_test", x_test.shape)
# x_train (12163, 28, 28)
# x_test (2027, 28, 28)
model = keras.Sequential(
[
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(1, activation="sigmoid"),
]
)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test, verbose=2)
print("accuracy:", acc)
# 2027/1 - 0s - loss: -9.9190e+04 - accuracy: 0.5599
# accuracy: 0.5599408
Your y_train and y_test is filled with class labels 1 and 5, but sigmoid activation in your last layer is squashing output between 0 and 1.
if you change 5 into 0 in your y you will get a really high accuracy:
y_train = np.where(y_train == 5, 0, y_train)
y_test = np.where(y_test == 5, 0, y_test)
result:
64/64 - 0s - loss: 0.0087 - accuracy: 0.9990
I created two identical neural networks for the MNIST dataset, one using TensorFlow and one using Keras. At 10 epochs, Keras achieves over 96% performance, while TensorFlow achieves about 70%.
I have tested the codes below on other datasets as well, and TensorFlow achieved much lower performance on all of them in a direct parameter comparison.
Keras Code:
import warnings
warnings.filterwarnings('ignore')
import keras
from keras.datasets import mnist
# Loading MNIST
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Converting the y-value column to an array of classes (one hot enconding)
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Changing the shape of input images and normalizing
x_train = x_train.reshape((60000, 784))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 784))
x_test = x_test.astype('float32') / 255
# Making the neural network
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(30, input_dim=784, kernel_initializer='normal', activation='relu'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='softmax'))
from keras.optimizers import Adam
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
# Training and showing the results
model.fit(x_train, y_train, epochs=10, batch_size=200, validation_data=(x_test, y_test), verbose=1)
TensorFlow Code:
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
#Loading MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Epochs parameters
epochs = 10
batch_size = 200
# Neural network parameters
n_input = 784
n_hidden_1 = 30
n_hidden_2 = 30
n_classes = 10
# Placeholders x, y
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Creating the first layer
w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
b1 = tf.Variable(tf.random_normal([n_hidden_1]))
layer_1 = tf.nn.relu(tf.add(tf.matmul(x,w1),b1))
# Creating the second layer
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
b2 = tf.Variable(tf.random_normal([n_hidden_2]))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1,w2),b2))
# Creating the output layer
w_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_out = tf.Variable(tf.random_normal([n_classes]))
output = tf.add(tf.matmul(layer_2, w_out), bias_out)
# Loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output, labels = y))
# Optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Making predictions
predictions = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
# Accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Opening the session
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(epochs):
avg_cost = 0.0
total_batches = int(mnist.train.num_examples / batch_size)
# Loop through all batch iterations
for i in range(total_batches):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Fit training
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
# Computing the average cost of a complete epoch
avg_cost += sess.run(cost, feed_dict={x: batch_x, y: batch_y}) / total_batches
# Running accuracy (with test data) on each epoch
accuracy_test = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
# Showing results after each epoch
print ("Epoch: ", "{},".format((epoch + 1)), "Average cost = ", "{:.3f}".format(avg_cost))
print ("Accuracy Test = ", "{:.3f}".format(accuracy_test))
print ("Training completed!")
print ("Model Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
Could anyone help me understand what may be causing this divergence?
Normalization drastically improves a model's accuracy.
In the keras code you have performed normalization on data, however I can't find any normalization done in the tensorflow code or perhaps the mnist data in tensorflow is already normalized at source?
Please verify that mnist data in tensorflow code is also normalized