Forward Pass calculation on current batch in "get_updates" method of Keras SGD Optimizer - tensorflow

I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer.
Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the gradients, but I want to use the updated weights.
Using Keras Version 2.3.1 and Tensorflow Version 1.14.0
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.learning_rate
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape, name='moment_' + str(i))
for (i, shape) in enumerate(shapes)]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
### own changes ###
if self.armijo:
inputs = (model._feed_inputs +
model._feed_targets +
model._feed_sample_weights)
input_layer = model.layers[0].input
armijo_function = K.function(inputs=input_layer, outputs=[loss],
updates=self.updates,name='armijo')
loss_next= armijo_function(inputs)
[....change updates if learning rate was not good enough...]
return self.updates
Unfortunately, I don't understand the error message when trying to calculate "loss_next":
tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.
Two questions here:
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
Anyone who can help? Thanks in advance.

how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
For this you can use batch_size = Total training records in model.fit() so that every epoch has just one forward pass and back propagation. Thus you can analysis the gradients on epoch 1 and modify the learning rate for epoch 2 OR if you are using the custom training loop then modify the code accordingly.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
I do not recall any other option to evaluate gradient apart from using from tensorflow.keras import backend as K in tensorflow version 1.x. The best option is to update tensorflow to latest version 2.2.0 and use tf.GradientTape.
Would recommend to go through this answer to capture gradients using from tensorflow.keras import backend as K in tensorflow 1.x.
Below is a sample code which is almost similar to your requirement. I am using tensorflow version 2.2.0. You can build your requirements from this program.
We are doing below functions in the program -
We are altering the Learning rate after every epoch. You can do that using callbacks argument of model.fit. Here I am incrementing learning rate by 0.01 for every epoch using tf.keras.callbacks.LearningRateScheduler and also displaying it at end of every epoch using tf.keras.callbacks.Callback.
Computing the gradient using tf.GradientTape() after end of every epoch. We are collecting the grads of every epoch to a list using append.
Also have set batch_size=len(train_images)as per your requirement.
Note : I am training on just 500 records from Cifar dataset due to memory constraints.
Code -
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
test_images = test_images[:50]
test_labels = test_labels[:50]
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10)
])
lr = 0.01
adam = Adam(lr)
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
epoch_gradient.append(grad)
gradcalc = GradientCalcCallback()
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate()
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs = 10
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images),
validation_data=(test_images, test_labels),
callbacks = [printlr,updatelr,gradcalc])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)
Output -
Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200
Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300
Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400
Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500
Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600
Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700
Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800
Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900
Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000
Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)
Hope this answers your question. Happy Learning.

Related

Is the loss function wrong in the following code for binary classification of images using soft labels? Or is there some other problem?

We are using CNN to classify images with labels 0 and 1 in tensorflow.
However, in reality, images have probability values between 0 and 1, not one-hot labels of 0 and 1. Images with probabilities in the range [0, 0.5) are labeled 0, and images in the range [0.5, 1.0] are labeled 1. I want to check whether the classification performance is better if binary classification is performed using soft labels between 0 and 1 instead of one-hot labels.
The code below is an example of binary classification only with data labeled 0 and 1 in the cifar10 dataset.
In the code below, the accuracy is about 98% without 'making soft labels part', but about 48% with 'making soft labels part'.
Should I modify the 'BinaryCrossEntropy_custom' function, which is the loss function, to solve the problem? Or is something else wrong?
This answer says that using logits solves it. I understand soft_labels argument, but what value should I put in logits argument in this example code?
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras import optimizers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import vgg16
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
# Rewrite the binary cross entropy function. We will modify this function to return a loss that fits the soft label later.
def BinaryCrossEntropy_custom(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())
term_1 = y_true * K.log(y_pred + K.epsilon())
return -K.mean(term_0 + term_1, axis=0)
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
# Only data with labels 0 and 1 are used.
train_ind01 = np.where((train_labels == 0) | (train_labels == 1))[0]
test_ind01 = np.where((test_labels == 0) | (test_labels == 1))[0]
train_images = train_images[train_ind01, :, :, :]
test_images = test_images[test_ind01, :, :, :]
train_labels = train_labels[train_ind01, :]
test_labels = test_labels[test_ind01, :]
train_labels = np.array(train_labels).astype('float64')
test_labels = np.array(test_labels).astype('float64')
# making soft labels part start
# Samples with label 0 are replaced with labels in the range [0,0.2],
# and samples with label 1 are replaced by labels in the range [0.8, 1.0].
sampl_train = np.random.uniform(low=-0.2, high=0.2, size=train_labels.shape)
sampl_test = np.random.uniform(low=-0.2, high=0.2, size=test_labels.shape)
train_labels = train_labels + sampl_train
test_labels = test_labels + sampl_test
train_labels = np.clip(train_labels, 0.0, 1.0)
test_labels = np.clip(test_labels, 0.0, 1.0)
# making soft labels part end
vgg = vgg16.VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
output = vgg.layers[-1].output
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
output = layers.Dropout(0.2)(output)
output = layers.Dense(256, activation='relu')(output)
output = layers.Dropout(0.2)(output)
predictions = layers.Dense(units=1, activation="sigmoid")(output)
model = Model(inputs=vgg.input, outputs=predictions)
model.compile(optimizer=Adam(learning_rate=.0001), loss=BinaryCrossEntropy_custom, metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=100,
validation_data=(test_images, test_labels))
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(test_acc)
The console output with making soft labels part is
Epoch 1/100
2022-09-16 15:29:29.136931: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8101
313/313 [==============================] - 17s 42ms/step - loss: 0.2951 - accuracy: 0.4779 - val_loss: 0.2775 - val_accuracy: 0.4650
Epoch 2/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2419 - accuracy: 0.4931 - val_loss: 0.2488 - val_accuracy: 0.4695
Epoch 3/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2290 - accuracy: 0.4978 - val_loss: 0.2424 - val_accuracy: 0.4740
Epoch 4/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2161 - accuracy: 0.5002 - val_loss: 0.2404 - val_accuracy: 0.4765
Epoch 5/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2139 - accuracy: 0.5007 - val_loss: 0.2620 - val_accuracy: 0.4730
Epoch 6/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2118 - accuracy: 0.5023 - val_loss: 0.2480 - val_accuracy: 0.4745
Epoch 7/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2097 - accuracy: 0.5019 - val_loss: 0.2350 - val_accuracy: 0.4775
Epoch 8/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2098 - accuracy: 0.5024 - val_loss: 0.2289 - val_accuracy: 0.4780
Epoch 9/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2034 - accuracy: 0.5039 - val_loss: 0.2364 - val_accuracy: 0.4780
Epoch 10/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2025 - accuracy: 0.5040 - val_loss: 0.2481 - val_accuracy: 0.4720

Classification with PyTorch is much slower than Tensorflow: 42min vs. 11min

I have been a Tensorflow user and start to use Pytorch. As a trial, I implemented simple classification tasks with both libraries.
However, PyTorch is much slower than Tensorflow: Pytorch takes 42min while TensorFlow 11min. I referred to PyTorch official Tutorial, and made only little change from it.
Could anyone share some advice for this problem?
Here is the summary what I tried.
environment: Colab Pro+
dataset: Cifar10
classifier: VGG16
optimizer: Adam
loss: crossentropy
batch size: 32
PyTorch
Code:
import torch, torchvision
from torch import nn
from torchvision import transforms, models
from tqdm import tqdm
import time, copy
trans = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),])
data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}
dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in tqdm(iter(dataloaders[phase])):
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase])
epoch_acc = running_corrects.double() / len(dataloaders[phase])
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.vgg16(pretrained=False)
model = model.to(device)
model = train_model(model=model,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=0.001),
dataloaders=dataloaders,
device=device,
)
Result:
Epoch 0/4
----------
0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]
train Loss: 75.5199 Acc: 3.2809
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.7274 Acc: 3.1949
Epoch 1/4
----------
100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]
train Loss: 73.8162 Acc: 3.2514
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]
test Loss: 73.6114 Acc: 3.1949
Epoch 2/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7741 Acc: 3.1369
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.5873 Acc: 3.1949
Epoch 3/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7493 Acc: 3.1331
100%|██████████| 313/313 [00:38<00:00, 8.12it/s]
test Loss: 73.6191 Acc: 3.1949
Epoch 4/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7289 Acc: 3.1939
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949
Training complete in 42m 22s
Best val Acc: 3.194888
Tensorflow
Code:
import tensorflow_datasets as tfds
from tensorflow.keras import applications, models
import tensorflow as tf
import time
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize)
ds_test_ = ds_test.map(resize)
model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
batch_size = 32
since = time.time()
history = model.fit(ds_train_,
batch_size = batch_size,
steps_per_epoch = len(ds_train)//batch_size,
epochs = 5,
validation_steps = len(ds_test),
validation_data = ds_test_,
shuffle = True,)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
Result:
Epoch 1/5
1562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 2/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000
Epoch 3/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 4/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000
Epoch 5/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000
Training complete in 11m 23s
It is because in your tensorflow codes, the data pipeline is feeding a batch of 1 image into the model per step instead of a batch of 32 images.
Passing batch_size into model.fit does not really control the batch size when the data is in the form of datasets. The reason why it showed a seemingly correct steps per epoch from the log is that you passed steps_per_epoch into model.fit.
To correctly set the batch size:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
label = tf.one_hot(label,10)
return (image, label)
train_size=len(ds_train)
test_size=len(ds_test)
ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)
ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit call:
history = model.fit(ds_train_,
epochs = 1,
validation_data = ds_test_)
After fixed the problem, tensorflow got similar speed performance with pytorch. In my machine, pytorch took ~27 minutes per epoch while tensorflow took ~24 minutes per epoch.
According to the benchmarks from NVIDIA, pytorch and tensorflow had similar speed performance in most popular deep learning applications with real-world datasets and problem size. (Reference: https://developer.nvidia.com/deep-learning-performance-training-inference)

Tensorflow NN implementation not converging

I'm trying to implement a simple feedforward Neural Network using only Tensorflow and it is not converging. I'm not sure if the problem is in the architecture of the network or the training process implementation. Simple 2 layer NN built using Keras seems to be converging fine:
from keras.layers import LSTM, Dense, Flatten, Conv1D
from keras import Sequential
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(21, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(np.array(train_in), np.array(train_target), epochs=10, validation_split=0.1, batch_size=16)
Epoch 2/10 59717/59717 [==============================] - 4s 71us/sample - loss: 1.4021 - accuracy: 0.6812 - val_loss: 1.1049 - val_accuracy: 0.7066
Epoch 3/10
59717/59717 [==============================] - 4s 70us/sample - loss: 1.0942 - accuracy: 0.7321 - val_loss: 1.2269 - val_accuracy: 0.7015
Epoch 4/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.9096 - accuracy: 0.7654 - val_loss: 0.8207 - val_accuracy: 0.7905
Epoch 5/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.8373 - accuracy: 0.7790 - val_loss: 0.6863 - val_accuracy: 0.8267
Epoch 6/10
59717/59717 [==============================] - 4s 72us/sample - loss: 0.7925 - accuracy: 0.7918 - val_loss: 0.8132 - val_accuracy: 0.7929
Epoch 7/10
59717/59717 [==============================] - 4s 73us/sample - loss: 0.7916 - accuracy: 0.7925 - val_loss: 0.6749 - val_accuracy: 0.8210
Epoch 8/10
19600/59717 [========>.....................] - ETA: 2s - loss: 0.7475 - accuracy: 0.8011
Here's my implementation of the same network in Tensorflow:
tf.compat.v1.disable_eager_execution()
batch_size = 10
hid_dim = 32
output_dim = 21
features = train_x.shape[1]
x = tf.compat.v1.placeholder(tf.float32, (batch_size, features), name='x')
y = tf.compat.v1.placeholder(tf.int32, (batch_size, ), name='y')
w1 = tf.Variable(tf.compat.v1.random_normal([features, hid_dim]), dtype=tf.float32)
b1 = tf.Variable(tf.compat.v1.random_normal([hid_dim]), dtype=tf.float32)
w2 = tf.Variable(tf.compat.v1.random_normal([hid_dim, output_dim]), dtype=tf.float32)
b2 = tf.Variable(tf.compat.v1.random_normal([output_dim]), dtype=tf.float32)
h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
h2 = tf.matmul(h1, w2) + b2
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h2, labels=y))
optimizer = tf.compat.v1.train.AdamOptimizer(0.001).minimize(loss)
pred = tf.nn.softmax(h2)
And here's my training procedure implementation. In my case batch_size is fixed so at each epoch I feed the whole dataset to the network batch by batch. I calculate loss after each batch and add it to array. After each epoch I take average of the epoch's batch losses array to get my overall epoch loss:
train_in = np.array(train_x)
train_target = np.array(train_y)
train_target = np.squeeze(train_target)
y_t = train_target
num_of_train_batches = len(train_in)/batch_size
init=tf.compat.v1.global_variables_initializer()
print('TRAIN BATCHES: ', num_of_train_batches)
epoch_list = []
epoch_losses = []
epochs = 50
with tf.compat.v1.Session() as sess:
sess.run(init)
print('TRAINING')
for epoch in range(epochs):
lt = []
ft = 0
tt = 1
train_losses = []
print('EPOCH: ', epoch)
epoch_list.append(epoch)
# RUN WHOLE SET
for it in range(int(num_of_train_batches)): #len(x_train)/batch_size
# OPTIMIZE
_, batch_loss = sess.run([optimizer, loss], feed_dict={x:train_in[ft*batch_size:tt*batch_size],
y:train_target[ft*batch_size:tt*batch_size]})
train_losses.append(batch_loss)
ft+=1
tt+=1
epoch_losses.append(np.array(train_losses).mean())
print('EPOCH: ', epoch)
print('LOSS: ', np.array(train_losses).mean())
TRAIN BATCHES: 2200.0
TRAINING
EPOCH: 0
EPOCH: 0
LOSS: 1370.9271
EPOCH: 1
EPOCH: 1
LOSS: 64.23466
EPOCH: 2
EPOCH: 2
LOSS: 36.015495
EPOCH: 3
EPOCH: 3
LOSS: 30.292429
EPOCH: 4
EPOCH: 4
LOSS: 26.436918
EPOCH: 5
EPOCH: 5
LOSS: 25.689302
EPOCH: 6
EPOCH: 6
LOSS: 23.730627
EPOCH: 7
EPOCH: 7
LOSS: 22.356762
EPOCH: 8
EPOCH: 8
LOSS: 21.81124
My Keras implementation reached 0.75 loss only after 8 epochs using the same number of hidden layers and hidden layer size, but my TF implementation is still showing greater than 10 loss even after 15 epochs.
Can someone please point out why is this hapenning? I'm guessing the problem has more to do with training procedure than actual NN.
All suggestions are welcomed!

Significantly higher testing accuracy on mnist with keras than tensorflow.keras

I was verifying with a basic example my TensorFlow (v2.2.0), Cuda (10.1), and cudnn (libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb) and I'm getting weird results...
When running the following example in Keras as shown in https://keras.io/examples/mnist_cnn/ I get the ~99% acc #validation. When I adapt the imports run via the TensorFlow I get only 86%.
I might be forgetting something.
To run using tensorflow:
from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Sadly, I get the following output:
Epoch 2/12
469/469 [==============================] - 3s 6ms/step - loss: 2.2245 - accuracy: 0.2633 - val_loss: 2.1755 - val_accuracy: 0.4447
Epoch 3/12
469/469 [==============================] - 3s 7ms/step - loss: 2.1485 - accuracy: 0.3533 - val_loss: 2.0787 - val_accuracy: 0.5147
Epoch 4/12
469/469 [==============================] - 3s 6ms/step - loss: 2.0489 - accuracy: 0.4214 - val_loss: 1.9538 - val_accuracy: 0.6021
Epoch 5/12
469/469 [==============================] - 3s 6ms/step - loss: 1.9224 - accuracy: 0.4845 - val_loss: 1.7981 - val_accuracy: 0.6611
Epoch 6/12
469/469 [==============================] - 3s 6ms/step - loss: 1.7748 - accuracy: 0.5376 - val_loss: 1.6182 - val_accuracy: 0.7039
Epoch 7/12
469/469 [==============================] - 3s 6ms/step - loss: 1.6184 - accuracy: 0.5750 - val_loss: 1.4296 - val_accuracy: 0.7475
Epoch 8/12
469/469 [==============================] - 3s 7ms/step - loss: 1.4612 - accuracy: 0.6107 - val_loss: 1.2484 - val_accuracy: 0.7719
Epoch 9/12
469/469 [==============================] - 3s 6ms/step - loss: 1.3204 - accuracy: 0.6402 - val_loss: 1.0895 - val_accuracy: 0.7945
Epoch 10/12
469/469 [==============================] - 3s 6ms/step - loss: 1.2019 - accuracy: 0.6650 - val_loss: 0.9586 - val_accuracy: 0.8097
Epoch 11/12
469/469 [==============================] - 3s 7ms/step - loss: 1.1050 - accuracy: 0.6840 - val_loss: 0.8552 - val_accuracy: 0.8216
Epoch 12/12
469/469 [==============================] - 3s 7ms/step - loss: 1.0253 - accuracy: 0.7013 - val_loss: 0.7734 - val_accuracy: 0.8337
Test loss: 0.7734305262565613
Test accuracy: 0.8337000012397766
Nowhere near 99.25% as when I import Keras.
What am I missing?
Discrepancy in optimiser parameters between keras and tensorflow.keras
So the crux of the issue lies in the different default parameters for the Adadelta optimisers in Keras and Tensorflow. Specifically, the different learning rates. We can see this with a simple check. Using the Keras version of the code, print(keras.optimizers.Adadelta().get_config()) outpus
{'learning_rate': 1.0, 'rho': 0.95, 'decay': 0.0, 'epsilon': 1e-07}
And in the Tensorflow version, print(tf.optimizers.Adadelta().get_config() gives us
{'name': 'Adadelta', 'learning_rate': 0.001, 'decay': 0.0, 'rho': 0.95, 'epsilon': 1e-07}
As we can see, there is a discrepancy between the learning rates for the Adadelta optimisers. Keras has a default learning rate of 1.0 while Tensorflow has a default learning rate of 0.001 (consistent with their other optimisers).
Effects of a higher learning rate
Since the Keras version of the Adadelta optimiser has a larger learning rate, it converges much faster and achieves a high accuracy within 12 epochs, while the Tensorflow Adadelta optimiser requires a longer training time. If you increased the number of training epochs, the Tensorflow model could potentially achieve a 99% accuracy as well.
The fix
But instead of increasing the training time, we can simply initialise the Tensorflow model to behave in a similar way to the Keras model by changing the learning rate of Adadelta to 1.0. i.e.
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(learning_rate=1.0), # Note the new learning rate
metrics=['accuracy'])
Making this change, we get the following performance running on Tensorflow:
Epoch 12/12
60000/60000 [==============================] - 102s 2ms/sample - loss: 0.0287 - accuracy: 0.9911 - val_loss: 0.0291 - val_accuracy: 0.9907
Test loss: 0.029134796149221757
Test accuracy: 0.9907
which is close to the desired 99.25% accuracy.
p.s. Incidentally, it seems that the different default parameters between Keras and Tensorflow is a known issue that was fixed but then reverted:
https://github.com/keras-team/keras/pull/12841 software development is hard.

model.fit_generator() fails with use_multiprocessing=True

In the code example below, I can train the model only when NOT using multiprocessing.
My generator is straight from the tensorflow.keras.utils.Sequence description https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
Any idea how to fix the generator to allow multiprocessing?
Running on Win 10, tensorflow 1.13.1, python 3.6.8
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.utils import Sequence
# Generator
class DataGenerator(Sequence):
def __init__(self, dim, batch_size, n_channels):
self.dim = dim
self.batch_size = batch_size
self.n_channels = n_channels
def __len__(self):
return 100
def __getitem__(self, idx):
X = np.random.randn(self.batch_size, self.dim, self.n_channels)
Y = np.random.randn(self.batch_size, self.dim, 1)
return X, Y
dim= 32
batch_size= 64
n_channels= 3
# Generators
training_generator = DataGenerator(dim, batch_size, n_channels)
validation_generator = DataGenerator(dim, batch_size, n_channels)
# Model
model = Sequential()
model.add(layers.GRU(128, return_sequences=True,
batch_input_shape=[None, training_generator.dim, training_generator.n_channels]))
model.add(layers.Dense(1))
model.compile(loss='mse', optimizer='adam')
# This training procedure runs
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1)
# This training procedure fails (Only change is that I added the multiprocessing options)
model.fit_generator(generator=training_generator,
epochs = 2,
steps_per_epoch = 100,
max_queue_size = 32,
validation_data=validation_generator,
validation_steps = 20,
verbose=1,
use_multiprocessing=True,
workers=4)
I expected the second fit_generator() call to train the model like the first one. Instead, I get no output, not even an error message.
I tried your code on Ubuntu 18.04.2 LTS machine with python 3.6.8 and tensorflow 1.13.1. It works in both cases as log shown below:
2019-07-13 12:56:17.003119: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
100/100 [==============================] - 3s 27ms/step - loss: 0.9987
100/100 [==============================] - 10s 103ms/step - loss: 0.9973 - val_loss: 0.9987
Epoch 2/2
100/100 [==============================] - 3s 26ms/step - loss: 0.9955
100/100 [==============================] - 8s 83ms/step - loss: 1.0028 - val_loss: 0.9955
Multiprocessing=True ......
Epoch 1/2
100/100 [==============================] - 3s 32ms/step - loss: 0.9952
100/100 [==============================] - 9s 89ms/step - loss: 0.9962 - val_loss: 0.9952
Epoch 2/2
100/100 [==============================] - 3s 28ms/step - loss: 0.9967
100/100 [==============================] - 9s 86ms/step - loss: 0.9968 - val_loss: 0.9967"
My suggestion is to first try with CPU only mode, by putting BOTH the model and the fit_generator code under "with tf.device('/cpu:0'):". If it works, it would be GPU related issue, such as proper driver, tensorflow with GPU support etc. Most likely, the issue was caused by GPU hanging.