I want to do very simple task. Let us assume that I have executed a model and saved multiple checkpoints and metada for this model using tf.estimator. We can again assume that I have 3 checkpoints. 1, 2 and 3. While I am evaluating the trained results on the tensorboard, I am realizing that checkpoint 2 is providing the better weights for my objective.
Therefore I want to load checkpoint 2 and make my predictions. What I want to ask simply is that, is it possible to delete checkpoint 3 from the model dir and let the estimator load it automatically from checkpoint 2 or is there anything I can do to load a specific checkpoint for.my predictions?
Thank you.
Yes, You can. By default, Estimator will load latest available checkpoint in model_dir. So you can either delete files manually, or specify checkpoint file with
warm_start = tf.estimator.WarmStartSettings(ckpt_to_initialize_from='file.ckpt')
and pass this to estimator
tf.estimator.Estimator(model_fn=model_fn,
config=run_config,
model_dir='dir',
warm_start_from=warm_start)
The latter option will not mess tensorboard summaries, so it's generally cleaner
Related
I made checkpoints every 1000 steps of training, and I have 16 files in my checkpoints directory. However it seems that when I want to retrieve the latest one it's reverting to its pre-trained state. I am assuming something to do with the summary logs not documenting that later checkpoints exist.
chkpt.restore(tf.train.latest_checkpoint(chkpt_dir))
# fit(train_ds, test_ds, steps=100000)
for i in range(10):
ex_input, ex_output = next(iter(test_ds.take(1)))
generate_images(generator, ex_input, ex_output, i, test=True)
How can I manually ask the checkpoint manager to retrieve this or that particular checkpoint file, as oppossed to .latest_checkpoint()?
Edit: Solved it myself, open the checkpoints.txt file in your checkpoint folder and set the suffix number to whichever checkpoint you want to load.
you can use the checkpoints.restore() method to restore checkpoints of your preference. For example, if you want to load checkpoint at iteration 1000, then you write:
checkpoint.restore('./test/model.ckpt-1000')
For more details please refer to this documentation. Thank You.
I have been training several models using 10-fold CV and added the ModelCheckpoint callback which saves the model with the lowest validation loss to an HDF5 file. However, for a while I would then call model.save(filepath) right after training.
I only came to the realization that the last call would probably save the model trained on the very last epoch and that the saved checkpoint is not being used at all. Is my assumption correct? If so, is it normal that the best models from the checkpoint files score lower than the ones saved with model.save()?
I'm writing a process-based implementation of a3c with tensorflow in eager mode. After every gradient update, my general model writes its parameters as checkpoints to a folder. The workers then update their parameters by loading the last checkpoints from this folder. However, there is a problem.
Often times, while the worker is reading the last available checkpoint from the folder, the master network will write new checkpoints to the folder and sometimes will erase the checkpoint that the worker is reading. A simple solution would be raising the maximum of checkpoints to keep. However, tfe.Checkpoint and tfe.Saver don't have a parameter to choose the max to keep.
Is there a way to achieve this?
For the tf.train.Saver you can specify max_to_keep:
tf.train.Saver(max_to_keep = 10)
and max_to_keep seems to be present in the both fte.Saver and it's tf.training.Saver.
I haven't tried if it works though.
It seems the suggested way of doing checkpoint deletion is to use the CheckpointManager.
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.contrib.checkpoint.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()
I have implemented the sequence to sequence model in Tensorflow for about 100,000 steps without specifying the summarizing operations required for TensorBoard.
I have the checkpoint log files for every 1000 steps. Is there any way to visualize the data without having to retrain the entire model i.e. extract the summaries from the checkpoint files to feed to TensorBoard?
I tried running TensorBoard directly on the checkpoint files, which obviously said no scalar summaries found. I also tried inserting the summary operations in code but it requires me to completely retrain the model for the summaries to get created.
Is it possible to only load specific layers (convolutional layers) out of one checkpoint file?
I've trained some CNNs fully-supervised and saved my progress (I'm doing object localization). To do auto-labelling I thought of building a weakly-supervised CNNs out of my current model...but since the weakly-supervised version has different fully-connected layers, I would like to select only the convolutional filters of my TensorFlow checkpoint file.
Of course I could manually save the weights of the corresponding layers, but due to the fact that they're already included in TensorFlow's checkpoint file I would like to extract them there, in order to have one single storing file.
TensorFlow 2.1 has many different public facilities for loading checkpoints (model.save, Checkpoint, saved_model, etc), but to the best of my knowledge, none of them has filtering API. So, let me suggest a snippet for hard cases which uses tooling from the TF2.1 internal development tests.
checkpoint_filename = '/path/to/our/weird/checkpoint.ckpt'
model = tf.keras.Model( ... ) # TF2.0 Model to initialize with the above checkpoint
variables_to_load = [ ... ] # List of model weight names to update.
from tensorflow.python.training.checkpoint_utils import load_checkpoint, list_variables
reader = load_checkpoint(checkpoint_filename)
for w in model.weights:
name=w.name.split(':')[0] # See (b/29227106)
if name in variables_to_load:
print(f"Updating {name}")
w.assign(reader.get_tensor(
# (Optional) Handle variable renaming
{'/var_name1/in/model':'/var_name1/in/checkpoint',
'/var_name2/in/model':'/var_name2/in/checkpoint',
# ... and so on
}.get(name,name)))
Note: model.weights and list_variables may help to inspect variables in Model and in the checkpoint
Note also, that this method will not restore model's optimizer state.