Visualize the model's training process taking into account ModelCheckpoint

Visualize the model's training process taking into account ModelCheckpoint - tensorflow

I am training a Tensorflow model, in which I include a checkpoint to save the best model (based on val_loss).
checkpoint = ModelCheckpoint(filepath, monitor='val_rmse', verbose=2, \
save_best_only=True, save_weights_only=False, \
mode='min', save_frequency=1)
After the training, to visualize the model's training process epoch after epoch using the stats stored in the r objects. I do:
plotter.plot({'Basic': history}, metric = 'loss')
Question: How do I do if I want to visual the model's straining process not epoch after epoch but only until the best model is saved. E.gg, if I initially set epoch=5,000 but the best model is at epoch = 2,000, I want to chart only until epoch = 2,000.
Thanks

From documentation: The filepath can contain named formatting options, which will be filled with the values of epoch and keys in logs (passed in on_epoch_end).
For example: if filepath is weights.{epoch:02d}-{val_loss:.2f}.hdf5, then the model checkpoints will be saved with the epoch number and the validation loss in the filename. e.g weights.0150-0.88.hdf5
You can then inspect the file name and plot until the desired epoch number.

Related

I am trying to resume training from a certain checkpoint

I'm facing a problem with restoring training from the last checkpoint that I saved. I'm following exactly this code except that I'm changing the dataset and increasing the number of epochs to 100: Machine Translation French-English notebook
What do I add in order to keep the training because it wouldn't finish in one days and every time it re-starts from epoch 1.
I've found a similar question but the answer didn't solve the problem: Resume training from a certain checkpoint.

I know this is late but I wanted to share the code of a possible solution to this.
Saving a checkpoint and restoring the model from it is pretty easy according to the Tensorflow documentation. The saving can be done using the Tensorflow callbacks every epoch (or with a save_freq additional argument every x epochs):
model.compile(..., metrics=['accuracy'])
EPOCHS = 10
checkpoint_filepath = '/path/to/checkpoint'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_accuracy',
mode='max',
save_best_only=True # if this is not the best epoch so far it is not saved.
)
model.fit(epochs=EPOCHS, callbacks=[model_checkpoint_callback])
Then, before starting a new train, or doing prediction, the weights of the saved checkpoint can be loaded like this:
model.load_weights(checkpoint_filepath)
That's it.

Evaluate trained and loaded CNN keras model problem

I am new in the deep neuron network world. I tried to train my one model using the TensorFlow Keras toolkit.
I managed to train a model using the fit function. The accuracy, for 50 epochs, was good - around 96% and the model predicts well with new data. The problem is when I try to evaluate the loaded model the results are like the model wasn't trained at all (accuracy around 50%).
I prepare the small test. I evaluate the model after a fit. Then I save the model, load it, and evaluate it once again. The results are so different. I thought that maybe weights aren't loaded properly, but the documentation suggests that save and load functions operate on the whole model. Here is my code:
CNNmodelHistory = model.fit(train_data, batch_size=batch_size, validation_data=test_data, steps_per_epoch=train_data.samples // batch_size,
epochs=echos)
scores = model.evaluate(test_data, verbose=0)
print(f'Test loss: {scores[0]} / Test accuracy: {scores[1]}')
# save the model to disk
model.save('gender_detection.modelTest')
modelLoaded = keras.models.load_model('gender_detection.modelTest')
scores = modelLoaded.evaluate(test_data, verbose=0)
print(f'Test loss: {scores[0]} / Test accuracy: {scores[1]}')
And here are the results:
Do you have any tips on what I am doing wrong?

Defining model to reduce overfitting effect of batch normalization

I'm trying to train my model using transfer learning from pretrained model with 30 classes and 7200 images(80% train, 10% validation, 10% test). My model is always overfitting despite changing various parameters. After i read this link https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets, i know batch normalization always update variance even though the convolutional base was freeze.
So, i set training = false in base_model. But, i'm still confused. Is my code correct? Because my image was augmented using ImageDataGenerator not like example where augmentation and preprocessing used as base model input.
This is my code
#Create the model
inputs = keras.Input(shape=(224, 224, 3))
x = base_model(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
outputs = tf.keras.layers.Dense((len(CLASS_NAMES)), activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)
history = model.fit_generator(train_data_gen,
epochs=epochs,
steps_per_epoch=int(np.ceil(total_train / float(BATCH_SIZE))),
validation_data=val_data_gen,
validation_steps=int(np.ceil(total_val / float(BATCH_SIZE))),
callbacks=[cm_callback,tensorboard_callback])
Output
576/576 [==============================] - 157s 273ms/step - loss: 0.0075 - accuracy: 0.9996
144/144 [==============================] - 26s 181ms/step - loss: 0.0092 - accuracy: 1.0000
[0.007482105916197825, 0.99956596]
[0.009182391463279297, 1.0]
If my code is correct, Is it good that the validation accuracy = 1(too accurate)?

How can I compute model metrics during training with canned estimators?

Using Keras, one typically gets metrics (e.g. accuracy) as part of the progress bar for free. Using the example here:
https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py
After running e.g.
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
Keras will start fitting the model, and will show progress output with something like:
3584/60000 [>.............................] - ETA: 10s - loss: 0.0308 - acc: 0.9905
Suppose I wanted to accomplish the same thing using a TensorFlow canned estimator -- extract the current accuracy for a classifier, and display that as part of a progress bar (done by e.g. a SessionRunHook).
It seems like accuracy metrics aren't provided as part of the default set of operations on a graph. Is there a way I can manually add it myself with a session run hook?
(It looks like it's possible to add operations to the graph as part of the begin() hook, but I'm not sure how I can e.g. request the computation of the model accuracy there.)

accuracy is one of the default metrics in canned classifiers. But it will be calculated by Estimator.evaluate call not by Estimator.train. You can create a for loop to do what you want:
for ...
estimator.train(training_data)
metrics = estimator.evaluate(evaluation_data)

How to get epoch num info from tf.train.string_input_producer

If reading files using string_input_producer, like
filename_queue = tf.train.string_input_producer(
files,
num_epochs=num_epochs,
shuffle=shuffle)
how can I get epoch num info during training(I want to show this info during training)
I tried below
run
tf.get_default_graph().get_tensor_by_name('input_train/input_producer/limit_epochs/epochs:0')
will always the same as the limit epoch num.
run
tf.get_default_graph().get_tensor_by_name('input_train/input_producer/limit_epochs/CountUpTo:0')
will each time add 1..
Both can not get correct epoch num during training.
Another thing is ,if retrain from existing model, can I got the epoch num info already trained?

I think the right approach here is to define a global_step variable that you pass to your optimizer (or you can increment it manually).
The TensorFlow Mechanics 101 tutorial provides an example:
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
Now global_step will be incremented each time the train_op runs. Since you know the size of your dataset and your batch size, you will know what epoch you're currently at.
When you save your model with a tf.train.Saver(), the global_step variable will also be saved. When you restore your model, you can just call global_step.eval() to get back the step value where you left off.
I hope this helps!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Visualize the model's training process taking into account ModelCheckpoint - tensorflow

Related

I am trying to resume training from a certain checkpoint

Evaluate trained and loaded CNN keras model problem

Defining model to reduce overfitting effect of batch normalization

How can I compute model metrics during training with canned estimators?

How to get epoch num info from tf.train.string_input_producer

Categories

Resources