Loading a model from checkpoint fails? - tensorflow2.0

https://www.tensorflow.org/beta/tutorials/generative/dcgan
I am following this tutorial for a DCGAN and am trying to restore a model from the checkpoint I saved. But when I try to load the model, it gives me an error:
AssertionError: Nothing except the root object matched a checkpointed value. Typically this means that the checkpoint does not match the Python program. The following objects have no matching checkpointed value: [<tensorflow.python.keras.layers.advanced_activations.LeakyReLU object at 0x7f05b545fc88>, <tf.Variable 'conv2d_transpose_5/kernel:0' shape=(3, 3, 1, 64) dtype=float32, numpy=.......
I tried to modify the checkpoint by only keeping the generator part since that is all I need from here. After that, I do
latest = tf.train.latest_checkpoint(checkpoint_dir)
gen_mod = make_generator_model() #This is already defined in the code
gen_mod.load_weights(latest)
sample = gen_mod(noise,training=False)
which gives me the error. Is there a way to just load the generator part?
What I want is to be able to generate images with the generator model from a given checkpoint.

Since, as you said, the DCGAN tutorial creates a snapshot of two models (a generator and a discriminator), load_weights would have to select the weights relevant only to your generator and it lacks the context to do that.
Instead of trying to load the weights directly into a model, you can restore the checkpoint and, from there, access the generator (or discriminator):
# from the DCGAN tutorial
checkpoint = tf.train.Checkpoint(
generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator,
)
latest = tf.train.latest_checkpoint(checkpoint_dir)
checkpoint.restore(latest)
# classify an image
checkpoint.discriminator(training_images[0:2])
# generate an image
checkpoint.generator(noise)

Related

Cannot load model weights in TensorFlow 2

I cannot load model weights after saving them in TensorFlow 2.2. Weights appear to be saved correctly (I think), however, I fail to load the pre-trained model.
My current code is:
segmentor = sequential_model_1()
discriminator = sequential_model_2()
def save_model(ckp_dir):
# create directory, if it does not exist:
utils.safe_mkdir(ckp_dir)
# save weights
segmentor.save_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'))
discriminator.save_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'))
def load_pretrained_model(ckp_dir):
try:
segmentor.load_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'), skip_mismatch=True)
discriminator.load_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'), skip_mismatch=True)
print('Loading pre-trained model from: {0}'.format(ckp_dir))
except ValueError:
print('No pre-trained model available.')
Then I have the training loop:
# training loop:
for epoch in range(num_epochs):
for image, label in dataset:
train_step()
# save best model I find during training:
if this_is_the_best_model_on_validation_set():
save_model(ckp_dir='logs_dir')
And then, at the end of the training "for loop", I want to load the best model and do a test with it. Hence, I run:
# load saved model and do a test:
load_pretrained_model(ckp_dir='logs_dir')
test()
However, this results in a ValueError. I checked the directory where the weights should be saved, and there they are!
Any idea what is wrong with my code? Am I loading the weights incorrectly?
Thank you!
Ok here is your problem - the try-except block you have is obscuring the real issue. Removing it gives the ValueError:
ValueError: When calling model.load_weights, skip_mismatch can only be set to True when by_name is True.
There are two ways to mitigate this - you can either call load_weights with by_name=True, or remove skip_mismatch=True depending on your needs. Either case works for me when testing your code.
Another consideration is that you when you store both the discriminator and segmentor checkpoints to the log directory, you overwrite the checkpoint file each time. This contains two strings that give the path to the specific model checkpoint files. Since you save discriminator second, every time this file will say discriminator with no reference to segmentor. You can mitigate this by storing each model in two subdirectories in the log directory instead, i.e.
logs_dir/
+ discriminator/
+ checkpoint
+ ...
+ segmentor/
+ checkpoint
+ ...
Although in the current state your code would work in this case.

Is it possible to load and train a model, if only thing we have is check point files?

Is it possible to load and train a model from check point files ?
We have information about input and output Tensor shape.
Check point files
Yes. You can use tensorflow-keras following this example.
https://www.tensorflow.org/guide/checkpoint
Directly from tensorflow documentation.
List checkpoints
!ls ./tf_ckpts
which produces
checkpoint ckpt-8.data-00000-of-00001 ckpt-9.index
ckpt-10.data-00000-of-00001 ckpt-8.index
ckpt-10.index ckpt-9.data-00000-of-00001
Recover from Checkpoint
Calling restore() on a tf.train.Checkpoint object queues the requested restorations, restoring variable values as soon as there's a matching path from the Checkpoint object. For example we can load just the bias from the model we defined above by reconstructing one path to it through the network and the layer.
to_restore = tf.Variable(tf.zeros([5])) # variables from your model.
print(to_restore.numpy()) # All zeros
fake_layer = tf.train.Checkpoint(bias=to_restore)
fake_net = tf.train.Checkpoint(l1=fake_layer)
new_root = tf.train.Checkpoint(net=fake_net)
status = new_root.restore(tf.train.latest_checkpoint('./tf_ckpts/'))
print(to_restore.numpy()) # We get the restored value now
To double check that it was restored you can type:
status.assert_existing_objects_matched()
and get the following output.
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f1d796da278>
Yes, it is possible if the checkpoint contains parameters of the model (parameters as W and b in W*x +b). I think you have that, in case of transfer learning, you can use this based on your files.
# Loads the weights
model.load_weights(checkpoint_path)
You should know the architecture of the model and create the model before using this. In some models, there is a specific way to load checkpoint.
Also, check this out: https://www.tensorflow.org/tutorials/keras/save_and_load

Tensorflow-Probability. - Saving and restoring checkpoints for bayesian neural network

I'd been looking at the Tensorflow Probability library and trying to modify the example in bayesian network example, hoping that I can save checkpoints and then restore them. I first started trying to use tf.train.Checkpoint but, although, I was not getting any error, neither when saving nor when restoring, didnt seem to restart the training from the previous checkpoint as the accuracy was completely different value.
I then tried using tf.keras.models.model.save, which again does save a file, but when trying to restore, I get the error: ValueError: Unknown layer: Conv2DFlipout when it is trying to deserialise the layer.
To be honest I dont know which way to go now, if somebody could point me to the right direction.
Thanks!
Giovanna
This is what I have so far to restore:
if FLAGS.architecture == "resnet":
model_fn = bayesian_resnet.bayesian_resnet
else:
model_fn = bayesian_vgg.bayesian_vgg
model = model_fn(
IMAGE_SHAPE,
num_classes=4,
kernel_posterior_scale_mean=FLAGS.kernel_posterior_scale_mean,
kernel_posterior_scale_constraint=FLAGS.kernel_posterior_scale_constraint)
print(images)
#check if saved checkpoint exists
exists = os.path.isfile(FLAGS.model_dir+"checkpoint.hdf5")
if exists:
model = tf.keras.models.load_model(FLAGS.model_dir+"checkpoint.hdf5")
logits = model(images)
labels_distribution = tfd.Categorical(logits=logits)
# Perform KL annealing. The optimal number of annealing steps
# depends on the dataset and architecture.
t = tf.Variable(0.0)
#kl_regularizer = t / (FLAGS.kl_annealing * len(x_train) / FLAGS.batch_size)
...

How to get the output of a maxpool layer in a pre-trained model in TensorFlow?

I have a model that I trained. I wish to extract from the model the output of an intermediate maxpool layer.
I tried the following
saver = tf.train.import_meta_graph(BASE_DIR + LOG_DIR + '/model.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint(BASE_DIR + LOG_DIR))
sess.run("maxpool/maxpool",feed_dict=feed_dict)
here, feed_dict contains the placeholders and their contents for this run in a dictionary.
I keep getting the following error
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_1_1' with dtype float and shape...
what can be the cause of this? I generated all of the placeholders and input them in the feed dictionary.
I ran in to a similar issue and it was frustrating. What got me around it was filling out the name field for every variable and operation that I wanted to call later. You also may need to add your maxpool/maxpool op to a collection with tf.add_to_collection('name_for_maxpool_op', maxpool_op_handle). You can then restore the ops and named tensors with:
# Restore from metagraph.
saver = tf.train.import_meta_graph(...)
sess = tf.Session()
saver = restore(sess, ...)
graph = sess.graph
# Restore your ops and tensors.
maxpool_op = tf.get_collection('name_for_maxpool_op')[0] # returns a list, you want the first element
a_tensor = graph.get_tensor_by_name('tensor_name:0') # need the :0 added to your name
Then you would build your feed_dict using your restored tensors. More information can be found here. Also, as you mentioned in your comment, you need to pass the op itself to sess.run, not it's name:
sess.run(maxpool_op, feed_dict=feed_dict)
You can access your tensors and ops from a restored metagraph even if you did not name them (to avoid retraining the model with new fancy tensor names, for instance), but it can be a bit of a pain. The names given to the tensors automatically are not always the most transparent. You can list the names of all variables in your graph with:
print([v.name for v in tf.all_variables()])
You can hopefully find the name that you are looking for there and then restore that tensor using graph.get_tensor_by_name as described above.

TensorFlow: load checkpoint, but only parts of it (convolutional layers)

Is it possible to only load specific layers (convolutional layers) out of one checkpoint file?
I've trained some CNNs fully-supervised and saved my progress (I'm doing object localization). To do auto-labelling I thought of building a weakly-supervised CNNs out of my current model...but since the weakly-supervised version has different fully-connected layers, I would like to select only the convolutional filters of my TensorFlow checkpoint file.
Of course I could manually save the weights of the corresponding layers, but due to the fact that they're already included in TensorFlow's checkpoint file I would like to extract them there, in order to have one single storing file.
TensorFlow 2.1 has many different public facilities for loading checkpoints (model.save, Checkpoint, saved_model, etc), but to the best of my knowledge, none of them has filtering API. So, let me suggest a snippet for hard cases which uses tooling from the TF2.1 internal development tests.
checkpoint_filename = '/path/to/our/weird/checkpoint.ckpt'
model = tf.keras.Model( ... ) # TF2.0 Model to initialize with the above checkpoint
variables_to_load = [ ... ] # List of model weight names to update.
from tensorflow.python.training.checkpoint_utils import load_checkpoint, list_variables
reader = load_checkpoint(checkpoint_filename)
for w in model.weights:
name=w.name.split(':')[0] # See (b/29227106)
if name in variables_to_load:
print(f"Updating {name}")
w.assign(reader.get_tensor(
# (Optional) Handle variable renaming
{'/var_name1/in/model':'/var_name1/in/checkpoint',
'/var_name2/in/model':'/var_name2/in/checkpoint',
# ... and so on
}.get(name,name)))
Note: model.weights and list_variables may help to inspect variables in Model and in the checkpoint
Note also, that this method will not restore model's optimizer state.