Each time I run:
y_true = np.argmax(tf.concat([y for x, y in train_ds], axis=0), axis=1)
y_pred = np.argmax(model.predict(train_ds), axis=1)
confusion_matrix(y_true, y_pred)
The result each time is different to my understanding the line:
y_pred = np.argmax(model.predict(train_ds), axis=1) is different each time.
Clarification: I run cell 1 (training) once. And cell 2 (inference) few times.
Cell 1 (jupyter)
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, experimental
from tensorflow.keras.layers import MaxPool2D, Flatten, Dense
from tensorflow.keras import Model
from tensorflow.keras.losses import categorical_crossentropy
from sklearn.metrics import accuracy_score
image_size = (100, 100)
batch_size = 32
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
inputs = Input(shape =(100,100,1))
x = experimental.preprocessing.Rescaling(1./255)(inputs)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =4, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = Conv2D (filters =8, kernel_size =3, padding ='same', activation='relu')(x)
x = MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
x = Flatten()(x)
x = Dense(units = 4, activation ='relu')(x)
x = Dense(units = 4, activation ='relu')(x)
output = Dense(units = 5, activation ='softmax')(x)
model = Model (inputs=inputs, outputs =output)
model.fit(train_ds, epochs=5)
Cell 2:
print (Accuracy:)
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(train_ds), axis=1)
print (accuracy_score(y_true, y_pred))
118/118 [==============================] - 7s 57ms/step - loss: 0.1888 - accuracy: 0.9398
Are you sure you do not train the model again every time you run the code? If the parameters of the model are the same the predicted result for the same input should be the same every time.
To my current understanding the reason of an above is the:
While instance of it is:
First run:
[x for x, y in train_ds]
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[157.],
Second run:
[x for x, y in train_ds]
[<tf.Tensor: shape=(32, 100, 100, 1), dtype=float32, numpy= array([[[[ 34.],
[ 36.],
[ 39.],
The possible solution
imgs, y_true = [], []
for img, label in train_ds:
imgs = tf.concat(imgs, axis=0)
y_true = np.argmax(tf.concat(y_true, axis=0), axis=1)
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
y_pred = np.argmax(model.predict(imgs), axis=1)
print (accuracy_score(y_true, y_pred))
Is there any better solution?
Maybe more appropriate apporach in case of validation dataset (here the train_ds is just for example is to add an argument Shuffle=False)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
Here it's probably the best option in case if your test images are in a separate folder.
path = 'your path to test folder'
test_generator = ImageDataGenerator().flow_from_directory(
target_size=(512, 512)
This is better than OPTION 1, since it can work on dataset, which doesn't fits into memory (RAM).
Below is the code I am trying to custom the global average pooling code. My goal is to change the line "return np.mean(inputs, axis=(1, 2)" and write my own custom pooling method. However, although the code is working, I'm having problems with gradients and I can't get the same result with the global average pooling method. I am getting the below warning. Can you help me please?
WARNING:tensorflow:Gradients do not exist for variables ['conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument?
WARNING:tensorflow:Gradients do not exist for variables ['conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument?
import numpy as np
import tensorflow as tf
import math
import os
import random
import itertools
from tensorflow.python.ops import gen_nn_ops
from tensorflow.python.ops import array_ops
import numba as nb
import matplotlib.pyplot as plt
import skimage
from keras import backend as K
class LGPooling2D(tf.keras.layers.Layer):
def __init__(self, pool_size=(3, 3), strides=(2, 2), padding='SAME', data_format='channels_last', **kwargs):
super(LGPooling2D, self).__init__(**kwargs)
self.pool_size = pool_size
self.strides = strides
self.padding = padding
self.data_format = 'NHWC' if data_format == 'channels_last' else 'NCHW'
self.output_dim = 64
def build(self, input_shape):
super(LGPooling2D, self).build(input_shape)
def _pooling_function(self, x, name=None):
#b = K.shape(x)[0]
input_shape = tf.keras.backend.int_shape(x)
b, r,c,channel = input_shape[0],input_shape[1],input_shape[2],input_shape[3]
def _mid_pool(inputs, is_train):
return np.mean(inputs, axis=(1, 2)) # we change this part.
def custom_grad(op, grad):
if self.data_format == 'NHWC':
ksizes = [1, self.pool_size[0], self.pool_size[1], 1]
strides = [1, self.strides[0], self.strides[1], 1]
ksizes = [1, 1, self.pool_size[0], self.pool_size[1]]
strides = [1, 1, self.strides[0], self.strides[1]]
return gen_nn_ops.max_pool_grad_v2(
), tf.constant(0.0)
def py_func(func, inp, Tout, stateful=True, name=None, grad=None, rnd_name=None):
# Need to generate a unique name to avoid duplicates:
g = tf.compat.v1.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.compat.v1.py_func(func, inp, Tout, stateful=stateful, name=name)
def _mid_range_pool(x, name=None):
rnd_name = 'LGPooling2D' + str(np.random.randint(0, 1E+8))
with tf.compat.v1.name_scope(name, "mod", [x]) as name:
z = py_func(_mid_pool,
[x, tf.keras.backend.learning_phase()],
grad=custom_grad, rnd_name=rnd_name)[0]
z.set_shape((b, channel))
return z
return _mid_range_pool(x, name)
def compute_output_shape(self, input_shape):
r, c = input_shape[1], input_shape[2]
sr, sc = self.strides
num_r = math.ceil(r/sr) if self.padding == 'SAME' else r//sr
num_c = math.ceil(c/sc) if self.padding == 'SAME' else c//sc
return (input_shape[0], input_shape[3])
def call(self, inputs):
# K.in_train_phase(self._tf_pooling_function(inputs), self._pooling_function_test(inputs))
input_shape = tf.shape(inputs)
output = self._pooling_function(inputs)
# output = tf.reshape(output, self.compute_output_shape(input_shape))
return output
def get_config(self):
config = {
'pool_size': self.pool_size,
'strides': self.strides
base_config = super(LGPooling2D, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def get_model():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation="relu",input_shape=input_shape))
# model.add(tf.keras.layers.SpatialDropout2D(0.15))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3),activation='relu'))
# model.add(tf.keras.layers.SpatialDropout2D(0.1))
# model.add(tf.keras.layers.MaxPooling2D())
model.add(LGPooling2D(pool_size=(3, 3), strides=(2, 2)))
# model.add(tf.keras.layers.GlobalAveragePooling2D())
# model.add(tf.keras.layers.Flatten())
# model.add(tf.keras.layers.Dense(100, activation='relu'))
# model.add(tf.keras.layers.Dropout(0.1))
model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
optim = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=1e-6)
model.compile( loss="categorical_crossentropy",optimizer=optim, metrics=['accuracy'])
return model
batch_size = 32
num_classes = 100
epochs = 50
# input image dimensions
img_rows, img_cols = 32, 32
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar100.load_data()
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255.0
x_test /= 255.0
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
reduceLROnPlat = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.4, patience=6, verbose=1, mode='min', min_delta=0.0001, cooldown=5, min_lr=0.00001)
callbacks_list = [reduceLROnPlat]
print('custom pooling')
model = get_model()
# model = get_model(IMG_SHAPE = (32, 32, 3))
tf.keras.utils.plot_model(model, to_file='LbpModel.png', show_shapes=True)
modelcustom = model.fit(x_train, y_train,
I would like to know
(1) how often the call() method of tf.keras.losses.Loss
and the update_state() method of tf.keras.metrics.Metric gets called during a training:
are they called per each instance (observation)?
or called per each batch?
(2) the dimension of y_true and y_pred passed to those methods:
are their dimension (batch_size x output_dimension)
or (1 x output_dimension)
The following code snippet comes from
For experiment I insert print(y_true.shape, y_pred.shape) in update_state() and I find that it is only printed once in the first epoch. From the print, it looks like y_true and y_pred have the dimension of
(1 x output_dimension) in this particular example but is it always the case?
So, additionally
(3) I would like to know why it is printed only once and only in the first epoch.
(4) I can't print the value of y_true or y_pred. How can I?
Epoch 1/3
(None, 1) (None, 10)
(None, 1) (None, 10)
782/782 [==============================] - 3s 4ms/step - loss: 0.5666 - categorical_true_positives: 22080.8940
Epoch 2/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1680 - categorical_true_positives: 23877.1162
Epoch 3/3
782/782 [==============================] - 3s 4ms/step - loss: 0.1190 - categorical_true_positives: 24198.2733
<tensorflow.python.keras.callbacks.History at 0x1fb132cde80>
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name="categorical_true_positives", **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name="ctp", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
print(y_true.shape, y_pred.shape) # For experiment
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, "int32") == tf.cast(y_pred, "int32")
values = tf.cast(values, "float32")
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, "float32")
values = tf.multiply(values, sample_weight)
def result(self):
return self.true_positives
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
model.fit(x_train, y_train, batch_size=64, epochs=3)
(1) how often the call() method of tf.keras.losses.Loss and the update_state() method of tf.keras.metrics.Metric gets called during a training:
The call method of tf.keras.losses.Loss and the update_state() are used at the end of each batch.
(2) the dimension of y_true and y_pred passed to those methods:
The dimensions of y_true is same as what you pass in y_train. The only change is, the first dimension of y_train will be no_of samples and in the case of y_true it will be batch_size. In your case it is (64, 1) where 64 is batch_size.
The dimensions of y_pred is the shape of output of the model. In your case it is (64, 10) because you have 10 dense units in final layer.
(3) I would like to know why it is printed only once and only in the first epoch.
The print statement is executed only once because tensorflow is executed in graph mode. Print will only work in eager mode. Add run_eagerly = True in model.compile step if you want to execute tensorflow code in eager mode.
(4) I can't print the value of y_true or y_pred. How can I?
Run the code in eager mode.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
class CategoricalTruePositives(keras.metrics.Metric):
def __init__(self, name="categorical_true_positives", **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name="ctp", initializer="zeros")
def update_state(self, y_true, y_pred, sample_weight=None):
print('update_state', y_true.shape, y_pred.shape) # For experiment
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, "int32") == tf.cast(y_pred, "int32")
values = tf.cast(values, "float32")
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, "float32")
values = tf.multiply(values, sample_weight)
def result(self):
return self.true_positives
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print("Start epoch {} of training".format(epoch))
def on_train_batch_begin(self, batch, logs=None):
keys = list(logs.keys())
print("...Training: start of batch {}".format(batch))
def on_train_batch_end(self, batch, logs=None):
print("...Training: end of batch {}".format(batch))
run_eagerly = True,
model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 0, callbacks=[CustomCallback()])
Start epoch 0 of training
...Training: start of batch 0
update_state (64, 1) (64, 10)
...Training: end of batch 0
...Training: start of batch 1
update_state (64, 1) (64, 10)
...Training: end of batch 1
...Training: start of batch 2
update_state (64, 1) (64, 10)
...Training: end of batch 2
...Training: start of batch 3
update_state (64, 1) (64, 10)
...Training: end of batch 3
...Training: start of batch 4
update_state (64, 1) (64, 10)
...Training: end of batch 4
...Training: start of batch 5
update_state (64, 1) (64, 10)
...Training: end of batch 5
The above example will make the answer to your clear.
I don't understand what I do wrong - every time when I launch this code I receive a different result.
I figured out that the result will be different when I change the batch size, but the accuracy should not depend on batch size.
And charts look completely wrong then I expected
Could somebody point me on my mistake?
%reload_ext autoreload
%autoreload 2
%matplotlib inline
## Load dataset
import tensorflow as tf
import tensorflow_datasets as tfds # must be 2.1
import matplotlib.pyplot as plt
builder = tfds.builder('beans')
info = builder.info
datasets = builder.as_dataset()
raw_train_dataset, raw_test_dataset = datasets['train'], datasets['test']
## Build a model
def get_model(image_width:int, image_height:int, num_classes:int):
model = tf.keras.Sequential()
#layer 1
model.add(tf.keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides= 4, padding= 'valid',
activation=tf.keras.activations.relu, input_shape=(image_width, image_height ,3)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2,2),padding= 'valid'))
# layer 2
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides= 1, padding='same',
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2,2), padding= 'valid'))
# layer 3
model.add(tf.keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides= 1, padding= 'same', activation=tf.keras.activations.relu))
model.add(tf.keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides= 1, padding= 'same', activation=tf.keras.activations.relu))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=(3,3), strides= 1, padding= 'same', activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2,2), padding= 'valid'))
# layer4 4
model.add(tf.keras.layers.Dense(4096, activation=tf.keras.activations.relu))
# layer4 4
model.add(tf.keras.layers.Dense(4096, activation=tf.keras.activations.relu))
# layer4 5
model.add(tf.keras.layers.Dense(1000, activation = 'relu'))
# layer4 6
model.add(tf.keras.layers.Dense(num_classes, activation = 'softmax'))
return model
IMAGE_width = 500
IMAGE_height = 500
NUM_CLASSES = info.features["label"].num_classes
CLASS_NAMES = info.features["label"].names
model = get_model(IMAGE_width, IMAGE_height, NUM_CLASSES)
result = model.compile( loss=tf.keras.losses.CategoricalCrossentropy(),
## Train
def prepare_record(record):
image = record['image']
# image = tf.image.resize(image, (image_width,image_height))
# image = tf.cast(image, tf.int32)
label = record['label']
return image, label
train_dataset = raw_train_dataset.map(prepare_record, num_parallel_calls=tf.data.experimental.AUTOTUNE).shuffle(1034).batch(517).prefetch(tf.data.experimental.AUTOTUNE)
for train_image_batch, train_label_batch in train_dataset:
train_one_hot_y = tf.one_hot(train_label_batch, NUM_CLASSES )
history = model.fit(train_image_batch, train_one_hot_y, epochs=10, verbose=0,validation_split=0.2)
plt.title('model accuracy')
plt.legend(['train', 'val'], loc='upper left')
plt.title('model loss')
plt.legend(['train', 'val'], loc='upper left')
In the fit function there is a parameter called shuffle:
Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
If you set it to False, the results should be equal.
Another, probably preferable, way would be to use tf.random.set_seed(seed), so that the shuffling is always performed in the same way (see docs).
I try to calculate the gradients with Tensorflow in the eager mode, but
tf.GradientTape () returns only None values. I can not understand why.
The gradients are calculated in the update_policy () function.
The output of the line:
grads = tape.gradient(loss, self.model.trainable_variables)
{list}<class 'list'>:[None, None, ... ,None]
Here is the code.
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
import numpy as np
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
class PGEagerAtariNetwork:
def __init__(self, state_space, action_space, lr, gamma):
self.state_space = state_space
self.action_space = action_space
self.gamma = gamma
self.model = tf.keras.Sequential()
# Conv
tf.keras.layers.Conv2D(filters=32, kernel_size=[8, 8], strides=[4, 4], activation='relu',
input_shape=(84, 84, 4,),
tf.keras.layers.Conv2D(filters=64, kernel_size=[4, 4], strides=[2, 2], activation='relu', name='conv2'))
tf.keras.layers.Conv2D(filters=128, kernel_size=[4, 4], strides=[2, 2], activation='relu', name='conv3'))
# Fully connected
self.model.add(tf.keras.layers.Dense(units=512, activation='relu', name='fc1'))
self.model.add(tf.keras.layers.Dropout(rate=0.4, name='dr1'))
self.model.add(tf.keras.layers.Dense(units=256, activation='relu', name='fc2'))
self.model.add(tf.keras.layers.Dropout(rate=0.3, name='dr2'))
self.model.add(tf.keras.layers.Dense(units=128, activation='relu', name='fc3'))
self.model.add(tf.keras.layers.Dropout(rate=0.1, name='dr3'))
# Logits
self.model.add(tf.keras.layers.Dense(units=self.action_space, activation=None, name='logits'))
# Optimizer
self.optimizer = tf.train.AdamOptimizer(learning_rate=lr)
def get_probs(self, s):
s = s[np.newaxis, :]
logits = self.model.predict(s)
probs = tf.nn.softmax(logits).numpy()
return probs
def update_policy(self, s, r, a):
with tf.GradientTape() as tape:
logits = self.model.predict(s)
policy_loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=a, logits=logits)
policy_loss = policy_loss * tf.stop_gradient(r)
loss = tf.reduce_mean(policy_loss)
grads = tape.gradient(loss, self.model.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.model.trainable_variables))
You don't have a forward pass in your model. The Model.predict() method returns numpy() array without taping the forward pass. Take a look at this example:
Given a following data and model:
import tensorflow as tf
import numpy as np
x_train = tf.convert_to_tensor(np.ones((1, 2), np.float32), dtype=tf.float32)
y_train = tf.convert_to_tensor([[0, 1]])
model = tf.keras.models.Sequential([tf.keras.layers.Dense(2, input_shape=(2, ))])
First we use predict():
with tf.GradientTape() as tape:
logits = model.predict(x_train)
print('`logits` has type {0}'.format(type(logits)))
# `logits` has type <class 'numpy.ndarray'>
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_train, logits=logits)
reduced = tf.reduce_mean(xentropy)
grads = tape.gradient(reduced, model.trainable_variables)
print('grads are: {0}'.format(grads))
# grads are: [None, None]
Now we use model's input:
with tf.GradientTape() as tape:
logits = model(x_train)
print('`logits` has type {0}'.format(type(logits)))
# `logits` has type <class 'tensorflow.python.framework.ops.EagerTensor'>
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_train, logits=logits)
reduced = tf.reduce_mean(xentropy)
grads = tape.gradient(reduced, model.trainable_variables)
print('grads are: {0}'.format(grads))
# grads are: [<tf.Tensor: id=2044, shape=(2, 2), dtype=float32, numpy=
# array([[ 0.77717704, -0.777177 ],
# [ 0.77717704, -0.777177 ]], dtype=float32)>, <tf.Tensor: id=2042,
# shape=(2,), dtype=float32, numpy=array([ 0.77717704, -0.777177 ], dtype=float32)>]
So use model's __call__() (i.e. model(x)) for forward pass and not predict().
I've been experimenting with TensorFlow's higher level APIs recently and got some strange results: when I train a seemingly exact same model with the same hyperparameters using Keras Model API and TensorFlow Estimator API, I get different results (using Keras leads to ~4% higher accuracy).
Here's my code:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Activation, Flatten
from tensorflow.keras.initializers import VarianceScaling
from tensorflow.keras.optimizers import Adam
# Load CIFAR-10 dataset and normalize pixel values
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32).reshape(-1)
X_test = np.array(X_test, dtype=np.float32)
y_test = np.array(y_test, dtype=np.int32).reshape(-1)
mean = X_train.mean(axis=(0, 1, 2), keepdims=True)
std = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)
# Define forward pass for a convolutional neural network.
# This function takes a batch of images as input and returns
# unscaled class scores (aka logits) from the last layer
def conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = MaxPooling2D()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=256, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = GlobalAveragePooling2D()(X)
X = Dense(10)(X)
return X
# For training this model I use Adam optimizer with learning_rate=1e-3
# Train the model for 10 epochs using keras.Model API
def keras_model():
inputs = Input(shape=(32,32,3))
scores = conv_net(inputs)
outputs = Activation('softmax')(scores)
model = Model(inputs=inputs, outputs=outputs)
return model
model1 = keras_model()
model1.fit(X_train, y_train_one_hot, batch_size=128, epochs=10)
results1 = model1.evaluate(X_test, y_test_one_hot)
# The above usually gives 79-82% accuracy
# Now train the same model for 10 epochs using tf.estimator.Estimator API
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train, \
batch_size=128, num_epochs=10, shuffle=True)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test, \
batch_size=128, num_epochs=1, shuffle=False)
def tf_estimator(features, labels, mode, params):
X = features['X']
scores = conv_net(X)
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions={'scores': scores})
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=labels)
metrics = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(scores, axis=-1))}
optimizer = tf.train.AdamOptimizer(learning_rate=params['lr'], epsilon=params['epsilon'])
step = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(loss), train_op=step, eval_metric_ops=metrics)
model2 = tf.estimator.Estimator(model_fn=tf_estimator, params={'lr': 3e-3, 'epsilon': tf.keras.backend.epsilon()})
results2 = model2.evaluate(input_fn=test_input_fn)
# This usually gives 75-78% accuracy
print('Keras accuracy:', results1[1])
print('Estimator accuracy:', results2['accuracy'])
I've trained both models 30 times, for 10 epochs each time: mean accuracy of the model trained with Keras is 0.8035 and mean accuracy of the model trained with Estimator is 0.7631 (standard deviations are 0.0065 and 0.0072 respectively). Accuracy is significantly higher if I use Keras. My question is why is this happenning? Am I doing something wrong or missing some important parameters? The architecture of the model is the same in both cases and I'm using the same hyperparametrers (I've even set Adam's epsilon to the same value, although it doesn't really affect overall result), but the accuracies are significantly different.
I also wrote training loop using raw TensorFlow and got the same accuracy as with Estimator API (lower than I get with Keras). It made me think that the default value of some parameter in Keras is different from TensorFlow, but they all actually seem to be the same.
I have also tried other architectures and sometimes I got smaller difference in accuracies, but I wasn't able to find any particular layer type that causes the difference. It looks like if I use more shallow network the difference often becomes smaller. Not always, however. For example, the difference in accuracies is even slightly bigger with the following model:
def simple_conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=5, strides=2, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Flatten()(X)
X = Dense(10)(X)
return X
Again, I've trained it for 10 epochs 30 times using Adam optimizer with 3e-3 learning rate. Mean accuracy with Keras is 0.6561 and mean accuracy with Estimator is 0.6101 (standard deviations are 0.0084 and 0.0111 respectively). What can be causing such a difference?