Keras is saving the wrong epoch with early stopping - tensorflow

I was wondering if anyone else has seen this happen with Keras recently. In the first screenshot, early stopping is triggered (I set patience to be 2 to demonstrate my point) and the best was epoch 4 with a loss of 0.0898. However, when I next run model.evaluate, you can see that the actual model that was saved was from epoch 5. On every run I've tried, it consistently saves the epoch that is one after the best epoch.
Code:
initializer = keras.initializers.RandomUniform(minval= -0.2, maxval=0.5)
model = keras.Sequential([
keras.layers.Dense(2, activation=K.elu, input_shape=[3], kernel_initializer=initializer),
keras.layers.Dense(2, activation=K.elu, kernel_initializer=initializer),
keras.layers.Dense(2, kernel_initializer=initializer)
])
rms = keras.optimizers.RMSprop(0.01)
model.compile(loss='mean_absolute_error',optimizer=rms)
es = keras.callbacks.EarlyStopping(
monitor='loss',
mode='min',
patience = 10,
restore_best_weights = True,
verbose=0)
historyData = model.fit(xarray_norm,yarray_norm,epochs=5,callbacks=[es])
training_eval = model.evaluate(xarray_norm,yarray_norm,verbose=1)
print()
print('training loss = ', training_eval)
print()
print('History of Loss:')
print(historyData.history)
Also, when early stop is not triggered, there appears to be a ghost extra epoch that is run and then stored but is neither printer nor shows up in the history:
I've run this code multiple times and what's shown in the screenshots consistently happens. It's not a big deal if you drive the learning rate down so there aren't a lot of fluctuations between epochs but the early stop doesn't appear to be working the way it should.
Edited: I see that the fit output could be impacted by the average over the different batches. But here is the consistency that shows that the restore best weights is always picking the epoch after the best one - the model.evaluate and model.fit outputs for loss are identical:
[1]: https://i.stack.imgur.com/BuDOn.png
[2]: https://i.stack.imgur.com/Th91D.png
[3]: https://i.stack.imgur.com/s6y9K.png
[4]: https://i.stack.imgur.com/1OyT5.png

Related

I am trying to resume training from a certain checkpoint

I'm facing a problem with restoring training from the last checkpoint that I saved. I'm following exactly this code except that I'm changing the dataset and increasing the number of epochs to 100: Machine Translation French-English notebook
What do I add in order to keep the training because it wouldn't finish in one days and every time it re-starts from epoch 1.
I've found a similar question but the answer didn't solve the problem: Resume training from a certain checkpoint.
I know this is late but I wanted to share the code of a possible solution to this.
Saving a checkpoint and restoring the model from it is pretty easy according to the Tensorflow documentation. The saving can be done using the Tensorflow callbacks every epoch (or with a save_freq additional argument every x epochs):
model.compile(..., metrics=['accuracy'])
EPOCHS = 10
checkpoint_filepath = '/path/to/checkpoint'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_accuracy',
mode='max',
save_best_only=True # if this is not the best epoch so far it is not saved.
)
model.fit(epochs=EPOCHS, callbacks=[model_checkpoint_callback])
Then, before starting a new train, or doing prediction, the weights of the saved checkpoint can be loaded like this:
model.load_weights(checkpoint_filepath)
That's it.

Multiple calls to fit when using LearningRateSchedule

if I "compile" a Keras model with an optimizer using a LearningRateSchedule and run model.fit() for one epoch several times, will it restart the learning rate scheduler every time or will it preserve its state?
model = create_keras_model()
lr_scheduler = create_lr_scheduler()
optimizer = Adam(learning_rate=lr_scheduler)
for i in range(10)
model.fit(dataset, epochs=1)
Thanks.

model.evaluate() varies wildly with number of steps when using generators

Running tensorflow 2.x in Colab with its internal keras version (tf.keras). My model is a 3D convolutional UNET for multiclass segmentation (not sure if it's relevant).
I've successfully trained (high enough accuracy on validation) this model the traditional way but I'd like to do augmentation to improve it, therefore I'm switching to (hand-written) generators. When I use generators I see my loss increasing and my accuracy decreasing a lot (e.g.: loss increasing 4-fold, not some %) in the fit.
To try to localize the issue I've tried loading my trained weights and computing the metrics on the data returned by the generators. And what's happening makes no sense. I can see that the results visually are ok.
model.evaluate(validationGenerator,steps=1)
2s 2s/step - loss: 0.4037 - categorical_accuracy: 0.8716
model.evaluate(validationGenerator,steps=2)
2s/step - loss: 1.7825 - categorical_accuracy: 0.7158
model.evaluate(validationGenerator,steps=4)
7s 2s/step - loss: 1.7478 - categorical_accuracy: 0.7038
Why would the loss vary with the number of steps? I could guess some % due to statistical variations... not 4 fold increase!
If I try
x,y = next(validationGenerator)
nSamples = x.shape[0]
meanLoss = np.zeros(nSamples)
meanAcc = np.zeros(nSamples)
for pIdx in range(nSamples):
y_pred = model.predict(np.expand_dims(x[pIdx,:,:,:,:],axis=0))
meanAcc[pIdx]=np.mean(tf.keras.metrics.categorical_accuracy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
meanLoss[pIdx]=np.mean(tf.keras.metrics.categorical_crossentropy(np.expand_dims(y[pIdx,:,:,:,:],axis=0),y_pred))
print(np.mean(meanAcc))
print(np.mean(meanLoss))
I get accuracy~85% and loss ~0.44. Which is what I expect from the previous fit, and it varies by vary little from one batch to the other. And these are the same exact numbers that I get if I do model.evaluate() with 1 step (using the same generator function).
However I need about 30 steps to run trough my whole training dataset. What should I do?
If I fit my already good model to this generator it indeed worsen the performances a lot (it goes from a nice segmentation of the image to uniform predictions of 25% for each of the 4 classes!!!!)
Any idea on where to debud the issue? I've also visually looked at the images produced by the generator and at the model predictions and everything looks correct (as testified by the numbers I found when evaluating using a single step). I've tried writing a minimal working example with a 2 layers model but... in it the issue does not happen.
UPDATE: Generators code
So, as I've been asked, these are the generators code. They're handwritten
def dataGen (X,Y_train):
patchS = 64 #set the size of the patch I extract
batchS = 16 #number of samples per batch
nSamples = X.shape[0] #get total number of samples
immSize = X.shape[1:] #get the shape of the iamge to crop
#Get 4 patches from each image
#extract them randomly, and in random patient order
patList = np.array(range(0,nSamples),dtype='int16')
patList = patList.reshape(nSamples,1)
patList = np.tile(patList,(4,2))
patList[:nSamples,0]=0 #Use this index to tell the code where to get the patch from
patList[nSamples:2*nSamples,0]=1
patList[2*nSamples:3*nSamples,0]=2
patList[3*nSamples:4*nSamples,0]=3
np.random.shuffle(patList)
patStart=0
Xout = np.zeros((batchS,patchS,patchS,patchS,immSize[3])) #allocate output vector
while True:
Yout = np.zeros((batchS,patchS,patchS,patchS)) #allocate vector of labels
for patIdx in range(batchS):
XSR = 32* (patList[patStart+patIdx,0]//2) #get the index of where to extract the patch
YSR = 32* (patList[patStart+patIdx,0]%2)
xStart = random.randrange(XSR,XSR+32) #get a patch randomly somewhere between a range
yStart = random.randrange(YSR,YSR+32)
zStart = random.randrange(0,26)
patInd = patList[patStart+patIdx,1]
Xout[patIdx,:,:,:,:] = X[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS),:]
Yout[patIdx,:,:,:] = Y_train[patInd,xStart:(xStart+patchS),yStart:(yStart+patchS),zStart:(zStart+patchS)]
if((patStart+patIdx)>(patList.shape[0]-2)):
np.random.shuffle(patList) #after going through the whole list restart
patStart=0
patStart = patStart+batchS
Yout = tf.keras.utils.to_categorical (Yout, num_classes=4, dtype='float32') #convert to one hot encoding
yield Xout, Yout
Posting the workaround I've found for the future person coming here from google.
Apparently the issue lies in how keras calls a handwritten generator. When it was called multiple times in a row by using evaluate(gen, steps=N) apparently it returned wrong outputs. There's no documentation around about how to address this or how a generator should be written.
I ended up writing my code using a tf.keras.utils.sequence class and the same previous code now works perfectly. No way to know why.
Here are different factors that affect loss & accuracy:
For Accuracy, we know that it measures the accuracy of the prediction: i.e. correct-classes /total-classes.
While loss tracks the inverse-confidence of the prediction.
A high Loss indicates that although the model is performing well with the prediction, It is becoming uncertain of the prediction it is making.
For example, For an image classification scenario, The image of a cat is passed into two models. Model A predicts {cat: 0.8, dog: 0.2} and model B predicts {cat: 0.6, dog: 0.4}.
Both models will score the same accuracy, but model B will have a higher loss.
On your evaluation part, Based on the documentation
Steps: Integer or None. Total number of steps (batches of samples) before declaring the evaluation round finished. Ignored with the default value of None. If x is a tf.data dataset and steps is None, 'evaluate' will run until the dataset is exhausted. This argument is not supported by array inputs.
So for simplify, it's getting the Nth batch of your validation samples.
It could be that the model prediction is becoming uncertain since the majority of the unknown data falls on those specific steps. which in your case, steps 2 & 3.
So, As the evaluation steps progress, The prediction becomes more uncertain leading to a higher loss.
You might need to retrain your model with more training samples but of course, you need to be careful since you might encounter overfitting.
In terms of data augmentation, you might wanna check this link
In Training Perspective, proper data augmentation is one of the factors that leads to good model performance.

can't reproduce model.fit with GradientTape

I've been trying to investigate into the reason (e.g. by checking weights, gradients and activations during training) why SGD with a 0.001 learning rate worked in training while Adam fails to do so. (Please see my previous post [here](Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"))
Note: I'm using the same model from my previous post here as well.
using tf.keras, i trained the neural network using model.fit():
model.compile(optimizer=SGD(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=ds,
epoch=80,
validation_data=ds_val)
This resulted in a epoch loss graphed below, within the 1st epoch, it's reached a train loss of 0.46 and then ultimately resulting in a train_loss of 0.1241 and val_loss of 0.2849.
I would've used tf.keras.callbacks.Tensorboard(histogram_freq=1) to train the network with both SGD(0.001) and Adam to investigate but it's throwing an InvalidArgumentError on Variable:0, something I can't decipher. So I tried to write a custom training loop using GradientTape and plotting the values.
using tf.GradientTape(), i tried to reproduce the results using the exact same model and dataset, however the epoch loss is training incredibly slowly, reaching train loss of 0.676 after 15 epochs (see graph below), is there something wrong with my implementation? (code below)
#tf.function
def compute_grads(train_batch: Dict[str,tf.Tensor], target_batch: tf.Tensor,
loss_fn: Loss, model: tf.keras.Model):
with tf.GradientTape(persistent=False) as tape:
# forward pass
outputs = model(train_batch)
# calculate loss
loss = loss_fn(y_true=target_batch, y_pred=outputs)
# calculate gradients for each param
grads = tape.gradient(loss, model.trainable_variables)
return grads, loss
BATCH_SIZE = 8
EPOCHS = 15
bce = BinaryCrossentropy()
optimizer = SGD(learning_rate=0.001)
for epoch in tqdm(range(EPOCHS), desc='epoch'):
# - accumulators
epoch_loss = 0.0
for (i, (train_batch, target_dict)) in tqdm(enumerate(ds_train.shuffle(1024).batch(BATCH_SIZE)), desc='step'):
(grads, loss) = compute_grads(train_batch, target_dict['target'], bce, model)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
epoch_loss += loss
avg_epoch_loss = epoch_loss/(i+1)
tensorboard_scalar(writer, name='epoch_loss', data=avg_epoch_loss, step=epoch) # custom helper function
print("Epoch {}: epoch_loss = {}".format(epoch, avg_epoch_loss))
Thanks in advance!
Check if you have shuffle your dataset then the problem may came from the shuffling using the tf.Dataset method. It only shuffled through the dataset one bucket at the time. Using the Keras.Model.fit yielded better results because it probably adds another shuffling.
By adding a shuffling with numpy.random.shuffle it may improve the training performance. From this reference.
The example of applying it into generation of the dataset is:
numpy_data = np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1), index_data.reshape(-1, 1)])
np.random.shuffle(numpy_data)
indexes = np.array(numpy_data[:, :2], dtype=np.uint32)
labels = np.array(numpy_data[:, 2].reshape(-1, 1), dtype=np.float32)
train_ds = data.Dataset.from_tensor_slices(
(indexes, labels)
).shuffle(100000).batch(batch_size, drop_remainder=True)
If this not work you may need to use Dataset .repeat(epochs_number) and .shuffle(..., reshuffle_each_iteration=True):
train_ds = data.Dataset.from_tensor_slices(
(np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1)]), index_data)
).shuffle(100000, reshuffle_each_iteration=True
).batch(batch_size, drop_remainder=True
).repeat(epochs_number)
for ix, (examples, labels) in train_ds.enumerate():
train_step(examples, labels)
current_epoch = ix // (len(index_data) // batch_size)
This workaround is not beautiful nor natural, for the moment you can use this to shuffle each epoch. It's a known issue and will be fixed, in the future you can use for epoch in range(epochs_number) instead of .repeat()
The solution provided here may also help a lot. You might want to check it out.
If this is not the case, you may want to speed up the TF2.0 GradientTape. This can be the solution:
TensorFlow 2.0 introduces the concept of functions, which translate eager code into graph code.
The usage is pretty straight-forward. The only change needed is that all relevant functions (like compute_loss and apply_gradients) have to be annotated with #tf.function.

TensorBoard Callback in Keras does not respect initial_epoch of fit?

I'm trying to train multiple models in parallel on a single graphics card. To achieve that I need to resume training of models from saved weights which is not a problem. The model.fit() method has even a parameter initial_epoch that lets me tell the model which epoch the loaded model is on. However when i pass a TensorBoard callback to the fit() method in order to monitor the training of the models, on Tensorboard all data is shown on x=0.
Is there a ways to overcome this and adjust the epoch on tensorboard?
By the way: Im running Keras 2.0.6 and Tensorflow 1.3.0.
self.callbacks = [TensorBoardCallback(log_dir='./../logs/'+self.model_name, histogram_freq=0, write_graph=True, write_images=False, start_epoch=self.step_num)]
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
self.model.load_weights('./weights/%s.hdf5'%(self.model_name))
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
The resulting graph on Tensorboard looks like this which is not what i was hoping for:
Update:
When passing epochs=10 to the first model.fit() the 10 epoch results are displayed in TensorBoard (see picture).
However when reloading the model and running it (with the same callback attached) the on_epoch_end method of the callback gets never called.
Turns out that when i pass the number of episodes to model.fit() to tell it how long to train, it has to be the number FROM the initial_epoch specified. So if initial_epoch=self.step_num then , epochs=self.step_num+10 if i want to train for 10 episodes.
Say we just started fitting our model and our first time epoch count is 30
(please ignore other paramterers just look at epochs and initial_epoch)
model.fit(train_dataloader,validation_data = test_dataloader,epochs =30,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
Now say ,after 30 epoch we want to start again from 31st epoch (you can see this in tesnorboard) by changing our Adam optimizer(or nay optimizer) learning rate
so we can do is
model.optimizer.learning_rate = 0.0005
model1.fit(train_dataloader,validation_data = test_dataloader,initial_epoch=30,epochs =55,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
=> So here initial_epoch= where we have left training last time;
epochs= initial_epoch+num_epoch we want to run for this second fit