Tensorflow NN implementation not converging - tensorflow

I'm trying to implement a simple feedforward Neural Network using only Tensorflow and it is not converging. I'm not sure if the problem is in the architecture of the network or the training process implementation. Simple 2 layer NN built using Keras seems to be converging fine:
from keras.layers import LSTM, Dense, Flatten, Conv1D
from keras import Sequential
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(21, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(np.array(train_in), np.array(train_target), epochs=10, validation_split=0.1, batch_size=16)
Epoch 2/10 59717/59717 [==============================] - 4s 71us/sample - loss: 1.4021 - accuracy: 0.6812 - val_loss: 1.1049 - val_accuracy: 0.7066
Epoch 3/10
59717/59717 [==============================] - 4s 70us/sample - loss: 1.0942 - accuracy: 0.7321 - val_loss: 1.2269 - val_accuracy: 0.7015
Epoch 4/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.9096 - accuracy: 0.7654 - val_loss: 0.8207 - val_accuracy: 0.7905
Epoch 5/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.8373 - accuracy: 0.7790 - val_loss: 0.6863 - val_accuracy: 0.8267
Epoch 6/10
59717/59717 [==============================] - 4s 72us/sample - loss: 0.7925 - accuracy: 0.7918 - val_loss: 0.8132 - val_accuracy: 0.7929
Epoch 7/10
59717/59717 [==============================] - 4s 73us/sample - loss: 0.7916 - accuracy: 0.7925 - val_loss: 0.6749 - val_accuracy: 0.8210
Epoch 8/10
19600/59717 [========>.....................] - ETA: 2s - loss: 0.7475 - accuracy: 0.8011
Here's my implementation of the same network in Tensorflow:
tf.compat.v1.disable_eager_execution()
batch_size = 10
hid_dim = 32
output_dim = 21
features = train_x.shape[1]
x = tf.compat.v1.placeholder(tf.float32, (batch_size, features), name='x')
y = tf.compat.v1.placeholder(tf.int32, (batch_size, ), name='y')
w1 = tf.Variable(tf.compat.v1.random_normal([features, hid_dim]), dtype=tf.float32)
b1 = tf.Variable(tf.compat.v1.random_normal([hid_dim]), dtype=tf.float32)
w2 = tf.Variable(tf.compat.v1.random_normal([hid_dim, output_dim]), dtype=tf.float32)
b2 = tf.Variable(tf.compat.v1.random_normal([output_dim]), dtype=tf.float32)
h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
h2 = tf.matmul(h1, w2) + b2
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h2, labels=y))
optimizer = tf.compat.v1.train.AdamOptimizer(0.001).minimize(loss)
pred = tf.nn.softmax(h2)
And here's my training procedure implementation. In my case batch_size is fixed so at each epoch I feed the whole dataset to the network batch by batch. I calculate loss after each batch and add it to array. After each epoch I take average of the epoch's batch losses array to get my overall epoch loss:
train_in = np.array(train_x)
train_target = np.array(train_y)
train_target = np.squeeze(train_target)
y_t = train_target
num_of_train_batches = len(train_in)/batch_size
init=tf.compat.v1.global_variables_initializer()
print('TRAIN BATCHES: ', num_of_train_batches)
epoch_list = []
epoch_losses = []
epochs = 50
with tf.compat.v1.Session() as sess:
sess.run(init)
print('TRAINING')
for epoch in range(epochs):
lt = []
ft = 0
tt = 1
train_losses = []
print('EPOCH: ', epoch)
epoch_list.append(epoch)
# RUN WHOLE SET
for it in range(int(num_of_train_batches)): #len(x_train)/batch_size
# OPTIMIZE
_, batch_loss = sess.run([optimizer, loss], feed_dict={x:train_in[ft*batch_size:tt*batch_size],
y:train_target[ft*batch_size:tt*batch_size]})
train_losses.append(batch_loss)
ft+=1
tt+=1
epoch_losses.append(np.array(train_losses).mean())
print('EPOCH: ', epoch)
print('LOSS: ', np.array(train_losses).mean())
TRAIN BATCHES: 2200.0
TRAINING
EPOCH: 0
EPOCH: 0
LOSS: 1370.9271
EPOCH: 1
EPOCH: 1
LOSS: 64.23466
EPOCH: 2
EPOCH: 2
LOSS: 36.015495
EPOCH: 3
EPOCH: 3
LOSS: 30.292429
EPOCH: 4
EPOCH: 4
LOSS: 26.436918
EPOCH: 5
EPOCH: 5
LOSS: 25.689302
EPOCH: 6
EPOCH: 6
LOSS: 23.730627
EPOCH: 7
EPOCH: 7
LOSS: 22.356762
EPOCH: 8
EPOCH: 8
LOSS: 21.81124
My Keras implementation reached 0.75 loss only after 8 epochs using the same number of hidden layers and hidden layer size, but my TF implementation is still showing greater than 10 loss even after 15 epochs.
Can someone please point out why is this hapenning? I'm guessing the problem has more to do with training procedure than actual NN.
All suggestions are welcomed!

Related

Transfer learning model not learning much, validation plateau at 45% while train go up to 90%

So it's been days i've been working on this model on image classification. I have 70000 images and 375 classes. I've tried training it with Vgg16, Xception, Resnet & Mobilenet ... and I always get the same limit of 45% on the validation.
As you can see here
I've tried adding dropout layers and regularization and it gets the same result for validation.
Data augmentation didn't do much to help either
Any ideas why this isn't working ?
Here's a snipped of the code of the last model I used:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras import regularizers
from PIL import Image
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
validation_datagen = ImageDataGenerator(rescale=1./255)
target_size = (height, width)
datagen = ImageDataGenerator(rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
path,
target_size=(height, width),
batch_size=batchSize,
shuffle=True,
class_mode='categorical',
subset='training')
validation_generator = datagen.flow_from_directory(
path,
target_size=(height, width),
batch_size=batchSize,
class_mode='categorical',
subset='validation')
num_classes = len(train_generator.class_indices)
xception_model = Xception(weights='imagenet',input_shape=(width, height, 3), include_top=False,classes=num_classes)
x = xception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
out = Dense(num_classes, activation='softmax')(x)
opt = Adam()
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
n_epochs = 15
history = model.fit(
train_generator,
steps_per_epoch = train_generator.samples // batchSize,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batchSize,
verbose=1,
epochs = n_epochs)
Yes, you may need a balanced dataset among each category in your dataset for better model training performance. Please try again by changing class_mode='sparse' and loss='sparse_categorical_crossentropy' because you are using the image dataset. Also freeze the pretrained model layers 'xception_model.trainable = False'.
Check the below code: (I have used a flower dataset of 5 classes)
xception_model = tf.keras.applications.Xception(weights='imagenet',input_shape=(width, height, 3), include_top=False,classes=num_classes)
xception_model.trainable = False
x = xception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(32, activation='relu')(x)
out = Dense(num_classes, activation='softmax')(x)
opt = tf.keras.optimizers.Adam()
model = keras.Model(inputs=xception_model.input, outputs=out)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_generator, epochs=10, validation_data=validation_generator)
Output:
Epoch 1/10
217/217 [==============================] - 23s 95ms/step - loss: 0.5945 - accuracy: 0.7793 - val_loss: 0.4610 - val_accuracy: 0.8337
Epoch 2/10
217/217 [==============================] - 20s 91ms/step - loss: 0.3439 - accuracy: 0.8797 - val_loss: 0.4550 - val_accuracy: 0.8419
Epoch 3/10
217/217 [==============================] - 20s 93ms/step - loss: 0.2570 - accuracy: 0.9150 - val_loss: 0.4437 - val_accuracy: 0.8384
Epoch 4/10
217/217 [==============================] - 20s 91ms/step - loss: 0.2040 - accuracy: 0.9340 - val_loss: 0.4592 - val_accuracy: 0.8477
Epoch 5/10
217/217 [==============================] - 20s 91ms/step - loss: 0.1649 - accuracy: 0.9494 - val_loss: 0.4686 - val_accuracy: 0.8512
Epoch 6/10
217/217 [==============================] - 20s 92ms/step - loss: 0.1301 - accuracy: 0.9589 - val_loss: 0.4805 - val_accuracy: 0.8488
Epoch 7/10
217/217 [==============================] - 20s 93ms/step - loss: 0.0966 - accuracy: 0.9754 - val_loss: 0.4993 - val_accuracy: 0.8442
Epoch 8/10
217/217 [==============================] - 20s 91ms/step - loss: 0.0806 - accuracy: 0.9806 - val_loss: 0.5488 - val_accuracy: 0.8372
Epoch 9/10
217/217 [==============================] - 20s 91ms/step - loss: 0.0623 - accuracy: 0.9864 - val_loss: 0.5802 - val_accuracy: 0.8360
Epoch 10/10
217/217 [==============================] - 22s 100ms/step - loss: 0.0456 - accuracy: 0.9896 - val_loss: 0.6005 - val_accuracy: 0.8360

Keras multi target classification model test accuracy is less than train accuracy

I am building multi classification model, after fit method, getting test accuracy is very very less than train accuracy even drop outs layers attached.
The layers definition is as follows.
n_labels = len(unique(X_train_enc))
in_layer = Input(shape=(1,))
em_layer = Embedding(input_dim = int(n_labels)+1,output_dim = 7,
input_length = 1, name="embedding")(in_layer)
dense = Dense(512, activation='relu', kernel_initializer='he_normal')(em_layer)
dense = Dropout(0.2)(dense)
output = Dense(7, activation='softmax')(dense)
model = Model(inputs=in_layer, outputs=output)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train_enc, y_train_enc, epochs=100, batch_size=100, verbose=1)
Epoch 98/100
572/572 [==============================] - 11s 19ms/step - loss: 3.7565e-11 - accuracy: 1.0000
Epoch 99/100
313/572 [===============>..............] - ETA: 5s - loss: 2.2852e-11 - accuracy: 1.0000
test_loss, test_acc = model.evaluate(X_test_enc, y_test_enc,verbose=1)
print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_loss)
print('Accuracy: %.2f' % (test_acc*100))
766/766 [==============================] - 7s 9ms/step - loss: 20.4420 - accuracy: 0.1426
Test Accuracy: 0.1426003873348236
Test Loss: 20.441951751708984
Accuracy: 14.26
any tuning will be highly appreciated.

Loss when starting fine tuning is higher than loss from transfer learning

Since I start fine tuning with the weights learned by transfer learning, I would expect the loss to be the same or less. However it looks like it starts fine tuning using a different set of starting weights.
Code to start transfer learning:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(units=3, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
Output from last epoch:
Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937
Code to start fine tuning:
model.trainable = True
# Fine-tune from this layer onwards
fine_tune_at = -20
# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
loss='binary_crossentropy',
metrics=['accuracy'])
history_fine = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
But this is what I see for the first few epochs:
Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881
Eventually the loss drops and passes the transfer learning loss:
Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563
Why was the loss in the first epoch of fine tuning higher than the last loss from transfer learning?

Classification with PyTorch is much slower than Tensorflow: 42min vs. 11min

I have been a Tensorflow user and start to use Pytorch. As a trial, I implemented simple classification tasks with both libraries.
However, PyTorch is much slower than Tensorflow: Pytorch takes 42min while TensorFlow 11min. I referred to PyTorch official Tutorial, and made only little change from it.
Could anyone share some advice for this problem?
Here is the summary what I tried.
environment: Colab Pro+
dataset: Cifar10
classifier: VGG16
optimizer: Adam
loss: crossentropy
batch size: 32
PyTorch
Code:
import torch, torchvision
from torch import nn
from torchvision import transforms, models
from tqdm import tqdm
import time, copy
trans = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),])
data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}
dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in tqdm(iter(dataloaders[phase])):
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase])
epoch_acc = running_corrects.double() / len(dataloaders[phase])
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.vgg16(pretrained=False)
model = model.to(device)
model = train_model(model=model,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=0.001),
dataloaders=dataloaders,
device=device,
)
Result:
Epoch 0/4
----------
0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]
train Loss: 75.5199 Acc: 3.2809
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.7274 Acc: 3.1949
Epoch 1/4
----------
100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]
train Loss: 73.8162 Acc: 3.2514
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]
test Loss: 73.6114 Acc: 3.1949
Epoch 2/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7741 Acc: 3.1369
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.5873 Acc: 3.1949
Epoch 3/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7493 Acc: 3.1331
100%|██████████| 313/313 [00:38<00:00, 8.12it/s]
test Loss: 73.6191 Acc: 3.1949
Epoch 4/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7289 Acc: 3.1939
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949
Training complete in 42m 22s
Best val Acc: 3.194888
Tensorflow
Code:
import tensorflow_datasets as tfds
from tensorflow.keras import applications, models
import tensorflow as tf
import time
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize)
ds_test_ = ds_test.map(resize)
model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
batch_size = 32
since = time.time()
history = model.fit(ds_train_,
batch_size = batch_size,
steps_per_epoch = len(ds_train)//batch_size,
epochs = 5,
validation_steps = len(ds_test),
validation_data = ds_test_,
shuffle = True,)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
Result:
Epoch 1/5
1562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 2/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000
Epoch 3/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 4/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000
Epoch 5/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000
Training complete in 11m 23s
It is because in your tensorflow codes, the data pipeline is feeding a batch of 1 image into the model per step instead of a batch of 32 images.
Passing batch_size into model.fit does not really control the batch size when the data is in the form of datasets. The reason why it showed a seemingly correct steps per epoch from the log is that you passed steps_per_epoch into model.fit.
To correctly set the batch size:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
label = tf.one_hot(label,10)
return (image, label)
train_size=len(ds_train)
test_size=len(ds_test)
ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)
ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit call:
history = model.fit(ds_train_,
epochs = 1,
validation_data = ds_test_)
After fixed the problem, tensorflow got similar speed performance with pytorch. In my machine, pytorch took ~27 minutes per epoch while tensorflow took ~24 minutes per epoch.
According to the benchmarks from NVIDIA, pytorch and tensorflow had similar speed performance in most popular deep learning applications with real-world datasets and problem size. (Reference: https://developer.nvidia.com/deep-learning-performance-training-inference)

Forward Pass calculation on current batch in "get_updates" method of Keras SGD Optimizer

I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer.
Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the gradients, but I want to use the updated weights.
Using Keras Version 2.3.1 and Tensorflow Version 1.14.0
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.learning_rate
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape, name='moment_' + str(i))
for (i, shape) in enumerate(shapes)]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
### own changes ###
if self.armijo:
inputs = (model._feed_inputs +
model._feed_targets +
model._feed_sample_weights)
input_layer = model.layers[0].input
armijo_function = K.function(inputs=input_layer, outputs=[loss],
updates=self.updates,name='armijo')
loss_next= armijo_function(inputs)
[....change updates if learning rate was not good enough...]
return self.updates
Unfortunately, I don't understand the error message when trying to calculate "loss_next":
tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.
Two questions here:
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
Anyone who can help? Thanks in advance.
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
For this you can use batch_size = Total training records in model.fit() so that every epoch has just one forward pass and back propagation. Thus you can analysis the gradients on epoch 1 and modify the learning rate for epoch 2 OR if you are using the custom training loop then modify the code accordingly.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
I do not recall any other option to evaluate gradient apart from using from tensorflow.keras import backend as K in tensorflow version 1.x. The best option is to update tensorflow to latest version 2.2.0 and use tf.GradientTape.
Would recommend to go through this answer to capture gradients using from tensorflow.keras import backend as K in tensorflow 1.x.
Below is a sample code which is almost similar to your requirement. I am using tensorflow version 2.2.0. You can build your requirements from this program.
We are doing below functions in the program -
We are altering the Learning rate after every epoch. You can do that using callbacks argument of model.fit. Here I am incrementing learning rate by 0.01 for every epoch using tf.keras.callbacks.LearningRateScheduler and also displaying it at end of every epoch using tf.keras.callbacks.Callback.
Computing the gradient using tf.GradientTape() after end of every epoch. We are collecting the grads of every epoch to a list using append.
Also have set batch_size=len(train_images)as per your requirement.
Note : I am training on just 500 records from Cifar dataset due to memory constraints.
Code -
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
test_images = test_images[:50]
test_labels = test_labels[:50]
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10)
])
lr = 0.01
adam = Adam(lr)
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
epoch_gradient.append(grad)
gradcalc = GradientCalcCallback()
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate()
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs = 10
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images),
validation_data=(test_images, test_labels),
callbacks = [printlr,updatelr,gradcalc])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)
Output -
Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200
Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300
Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400
Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500
Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600
Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700
Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800
Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900
Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000
Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)
Hope this answers your question. Happy Learning.