Inconsistent results when running the same neural network in TensorFlow vs Keras - tensorflow

I created two identical neural networks for the MNIST dataset, one using TensorFlow and one using Keras. At 10 epochs, Keras achieves over 96% performance, while TensorFlow achieves about 70%.
I have tested the codes below on other datasets as well, and TensorFlow achieved much lower performance on all of them in a direct parameter comparison.
Keras Code:
import warnings
warnings.filterwarnings('ignore')
import keras
from keras.datasets import mnist
# Loading MNIST
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Converting the y-value column to an array of classes (one hot enconding)
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Changing the shape of input images and normalizing
x_train = x_train.reshape((60000, 784))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 784))
x_test = x_test.astype('float32') / 255
# Making the neural network
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(30, input_dim=784, kernel_initializer='normal', activation='relu'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='softmax'))
from keras.optimizers import Adam
optimizer = Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
# Training and showing the results
model.fit(x_train, y_train, epochs=10, batch_size=200, validation_data=(x_test, y_test), verbose=1)
TensorFlow Code:
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
#Loading MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Epochs parameters
epochs = 10
batch_size = 200
# Neural network parameters
n_input = 784
n_hidden_1 = 30
n_hidden_2 = 30
n_classes = 10
# Placeholders x, y
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Creating the first layer
w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
b1 = tf.Variable(tf.random_normal([n_hidden_1]))
layer_1 = tf.nn.relu(tf.add(tf.matmul(x,w1),b1))
# Creating the second layer
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
b2 = tf.Variable(tf.random_normal([n_hidden_2]))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1,w2),b2))
# Creating the output layer
w_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_out = tf.Variable(tf.random_normal([n_classes]))
output = tf.add(tf.matmul(layer_2, w_out), bias_out)
# Loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output, labels = y))
# Optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Making predictions
predictions = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
# Accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Opening the session
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(epochs):
avg_cost = 0.0
total_batches = int(mnist.train.num_examples / batch_size)
# Loop through all batch iterations
for i in range(total_batches):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Fit training
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
# Computing the average cost of a complete epoch
avg_cost += sess.run(cost, feed_dict={x: batch_x, y: batch_y}) / total_batches
# Running accuracy (with test data) on each epoch
accuracy_test = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
# Showing results after each epoch
print ("Epoch: ", "{},".format((epoch + 1)), "Average cost = ", "{:.3f}".format(avg_cost))
print ("Accuracy Test = ", "{:.3f}".format(accuracy_test))
print ("Training completed!")
print ("Model Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
Could anyone help me understand what may be causing this divergence?

Normalization drastically improves a model's accuracy.
In the keras code you have performed normalization on data, however I can't find any normalization done in the tensorflow code or perhaps the mnist data in tensorflow is already normalized at source?
Please verify that mnist data in tensorflow code is also normalized

Related

Tensorflow keras, no gradients error when use MSE

I am trying the code from tensorflow "Writing a training loop from scratch" with some changes by myself. I changed the loss function from SparseCategoricalCrossentropy to MeanSquaredError. I also changed the architecture of the model by adding a new Lambda layer for loss calculation. However, I have the Value error that no gradients provided for variable. Is there any way that I can make the code to run with MSE?
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(784,), name="digits")
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
outputs = layers.Dense(10, name="predictions")(x2)
final_outputs = layers.Lambda(lambda x: tf.math.argmax(x, axis = -1))(outputs)
model = keras.Model(inputs=inputs, outputs=final_outputs)
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.MeanSquaredError()
# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# Reserve 10,000 samples for validation.
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
argmax ops is not differentiable. To use an integer label and MSE loss, you want to convert your labels y_train and y_val to integers.
y_train = np.argmax(y_train, axis=-1)
y_val = np.argmax(y_val, axis=-1)
And adjust the output layer to output integer labels
outputs = layers.Dense(1, name="predictions")(x2)

Model.fit() output in Tensorflow 2

I am trying to use TF 2.0 for MNIST dataset with batch size of 128.
During learning I get the following message in an EPOCH. If my batch size is 128, what does 375 signify?
Epoch 13/200
375/375 [==============================] - 2s 4ms/step - loss: 0.2351 - accuracy: 0.9330 - val_loss: 0.2258 - val_accuracy: 0.9370
import tensorflow as tf
import numpy as np
from tensorflow import keras
# Network and training parameters.
EPOCHS = 200
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # number of outputs = number of digits
N_HIDDEN = 128
VALIDATION_SPLIT = 0.2 # how much TRAIN is reserved for VALIDATION
# Loading MNIST dataset.
# verify
# You can verify that the split between train and test is 60,000, and 10,000 respectively.
# Labels have one-hot representation.is automatically applied
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# X_train is 60000 rows of 28x28 values; we --> reshape it to
# 60000 x 784.
RESHAPED = 784
#
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize inputs to be within in [0, 1].
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# One-hot representation of the labels.
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN,
input_shape=(RESHAPED,),
name='dense_layer', activation='relu'))
model.add(keras.layers.Dense(N_HIDDEN,
name='dense_layer_2', activation='relu'))
model.add(keras.layers.Dense(NB_CLASSES,
name='dense_layer_3', activation='softmax'))
# Compiling the model.
model.compile(optimizer='SGD',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Training the model.
model.fit(X_train, Y_train,
batch_size=BATCH_SIZE, epochs=EPOCHS,
verbose=VERBOSE, validation_split=VALIDATION_SPLIT)
The 375 is the number of steps per epoch your model must complete in order to finish one epoch. (Every epoch your model will process steps_per_epoch number of batches)
steps_per_epoch = len(X_train) // batch_size
but for your case, since you have specified 0.20 value to the validation_split parameter to model.fit()
steps_per_epoch = (len(X_train)*(1-VALIDATION_SPLIT)) // batch_size
so plugging the values,
steps = ((60,000) * (1-0.20)) // 128 = 375

Improve Prediction Accuracy of Keras Neural Network Algorithm

How can I modify my code below to improve its current accuracy of 71.38%? The dataset is 6 independant variables of numbers and 1 dependant variable of either 0 or 1.
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
dataset = loadtxt('numbers/numbers.csv', delimiter=',')
X = dataset[:,0:6]
y = dataset[:,6]
model = Sequential()
model.add(Dense(32, input_dim=6, activation='softsign'))
model.add(Dense(16, activation='softsign'))
model.add(Dense(1, activation='softsign'))
model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=['accuracy'])
model.fit(X, y, epochs=50, batch_size=100)
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
predictions = model.predict_classes(X)
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

How to modify weights of Keras layers?

I am trying to freeze some of the weights of a layer by setting them to a specific value in Keras. How can I achieve this without moving weights to CPU ?
I checked similar questions such as modify layer weights in keras and modify layer parameters in keras
Answers suggest usage of get_weights() and 'set_weights()', however those functions moves weights between CPU and GPU.
I created a custom lambda layer and modified model.trainable_weights inside of that layer, however weights are not updated.
I used tf advanced tutorial, and just added a custom lambda layer that multiplies weights with zero.
Colab notebook with same code
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Lambda
from tensorflow.keras import Model
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Add a channels dimension
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
def antirectifier(x):
for i,w in enumerate(model.trainable_weights):
model.trainable_weights[i] = tf.multiply(w,0)
return x
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu')
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
self.mask = Lambda(antirectifier)
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
x = self.mask(x)
return self.d2(x)
# Create an instance of the model
model = MyModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
#tf.function
def test_step(images, labels):
predictions = model(images)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
EPOCHS = 5
for epoch in range(EPOCHS):
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch+1,
train_loss.result(),
train_accuracy.result()*100,
test_loss.result(),
test_accuracy.result()*100))
# Reset the metrics for the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
Since weights are zero, accuracy should be low. However weights are not changed.

Training the same model with Keras Model API and TensorFlow Estimator API gives different accuracies

I've been experimenting with TensorFlow's higher level APIs recently and got some strange results: when I train a seemingly exact same model with the same hyperparameters using Keras Model API and TensorFlow Estimator API, I get different results (using Keras leads to ~4% higher accuracy).
Here's my code:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Activation, Flatten
from tensorflow.keras.initializers import VarianceScaling
from tensorflow.keras.optimizers import Adam
# Load CIFAR-10 dataset and normalize pixel values
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32).reshape(-1)
X_test = np.array(X_test, dtype=np.float32)
y_test = np.array(y_test, dtype=np.int32).reshape(-1)
mean = X_train.mean(axis=(0, 1, 2), keepdims=True)
std = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)
# Define forward pass for a convolutional neural network.
# This function takes a batch of images as input and returns
# unscaled class scores (aka logits) from the last layer
def conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = MaxPooling2D()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=256, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = GlobalAveragePooling2D()(X)
X = Dense(10)(X)
return X
# For training this model I use Adam optimizer with learning_rate=1e-3
# Train the model for 10 epochs using keras.Model API
def keras_model():
inputs = Input(shape=(32,32,3))
scores = conv_net(inputs)
outputs = Activation('softmax')(scores)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=Adam(lr=3e-3),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
model1 = keras_model()
model1.fit(X_train, y_train_one_hot, batch_size=128, epochs=10)
results1 = model1.evaluate(X_test, y_test_one_hot)
print(results1)
# The above usually gives 79-82% accuracy
# Now train the same model for 10 epochs using tf.estimator.Estimator API
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train, \
batch_size=128, num_epochs=10, shuffle=True)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test, \
batch_size=128, num_epochs=1, shuffle=False)
def tf_estimator(features, labels, mode, params):
X = features['X']
scores = conv_net(X)
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions={'scores': scores})
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=labels)
metrics = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(scores, axis=-1))}
optimizer = tf.train.AdamOptimizer(learning_rate=params['lr'], epsilon=params['epsilon'])
step = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(loss), train_op=step, eval_metric_ops=metrics)
model2 = tf.estimator.Estimator(model_fn=tf_estimator, params={'lr': 3e-3, 'epsilon': tf.keras.backend.epsilon()})
model2.train(input_fn=train_input_fn)
results2 = model2.evaluate(input_fn=test_input_fn)
print(results2)
# This usually gives 75-78% accuracy
print('Keras accuracy:', results1[1])
print('Estimator accuracy:', results2['accuracy'])
I've trained both models 30 times, for 10 epochs each time: mean accuracy of the model trained with Keras is 0.8035 and mean accuracy of the model trained with Estimator is 0.7631 (standard deviations are 0.0065 and 0.0072 respectively). Accuracy is significantly higher if I use Keras. My question is why is this happenning? Am I doing something wrong or missing some important parameters? The architecture of the model is the same in both cases and I'm using the same hyperparametrers (I've even set Adam's epsilon to the same value, although it doesn't really affect overall result), but the accuracies are significantly different.
I also wrote training loop using raw TensorFlow and got the same accuracy as with Estimator API (lower than I get with Keras). It made me think that the default value of some parameter in Keras is different from TensorFlow, but they all actually seem to be the same.
I have also tried other architectures and sometimes I got smaller difference in accuracies, but I wasn't able to find any particular layer type that causes the difference. It looks like if I use more shallow network the difference often becomes smaller. Not always, however. For example, the difference in accuracies is even slightly bigger with the following model:
def simple_conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=5, strides=2, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Flatten()(X)
X = Dense(10)(X)
return X
Again, I've trained it for 10 epochs 30 times using Adam optimizer with 3e-3 learning rate. Mean accuracy with Keras is 0.6561 and mean accuracy with Estimator is 0.6101 (standard deviations are 0.0084 and 0.0111 respectively). What can be causing such a difference?