Cannot load model weights in TensorFlow 2 - tensorflow

I cannot load model weights after saving them in TensorFlow 2.2. Weights appear to be saved correctly (I think), however, I fail to load the pre-trained model.
My current code is:
segmentor = sequential_model_1()
discriminator = sequential_model_2()
def save_model(ckp_dir):
# create directory, if it does not exist:
utils.safe_mkdir(ckp_dir)
# save weights
segmentor.save_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'))
discriminator.save_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'))
def load_pretrained_model(ckp_dir):
try:
segmentor.load_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'), skip_mismatch=True)
discriminator.load_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'), skip_mismatch=True)
print('Loading pre-trained model from: {0}'.format(ckp_dir))
except ValueError:
print('No pre-trained model available.')
Then I have the training loop:
# training loop:
for epoch in range(num_epochs):
for image, label in dataset:
train_step()
# save best model I find during training:
if this_is_the_best_model_on_validation_set():
save_model(ckp_dir='logs_dir')
And then, at the end of the training "for loop", I want to load the best model and do a test with it. Hence, I run:
# load saved model and do a test:
load_pretrained_model(ckp_dir='logs_dir')
test()
However, this results in a ValueError. I checked the directory where the weights should be saved, and there they are!
Any idea what is wrong with my code? Am I loading the weights incorrectly?
Thank you!

Ok here is your problem - the try-except block you have is obscuring the real issue. Removing it gives the ValueError:
ValueError: When calling model.load_weights, skip_mismatch can only be set to True when by_name is True.
There are two ways to mitigate this - you can either call load_weights with by_name=True, or remove skip_mismatch=True depending on your needs. Either case works for me when testing your code.
Another consideration is that you when you store both the discriminator and segmentor checkpoints to the log directory, you overwrite the checkpoint file each time. This contains two strings that give the path to the specific model checkpoint files. Since you save discriminator second, every time this file will say discriminator with no reference to segmentor. You can mitigate this by storing each model in two subdirectories in the log directory instead, i.e.
logs_dir/
+ discriminator/
+ checkpoint
+ ...
+ segmentor/
+ checkpoint
+ ...
Although in the current state your code would work in this case.

Related

Is it possible to load and train a model, if only thing we have is check point files?

Is it possible to load and train a model from check point files ?
We have information about input and output Tensor shape.
Check point files
Yes. You can use tensorflow-keras following this example.
https://www.tensorflow.org/guide/checkpoint
Directly from tensorflow documentation.
List checkpoints
!ls ./tf_ckpts
which produces
checkpoint ckpt-8.data-00000-of-00001 ckpt-9.index
ckpt-10.data-00000-of-00001 ckpt-8.index
ckpt-10.index ckpt-9.data-00000-of-00001
Recover from Checkpoint
Calling restore() on a tf.train.Checkpoint object queues the requested restorations, restoring variable values as soon as there's a matching path from the Checkpoint object. For example we can load just the bias from the model we defined above by reconstructing one path to it through the network and the layer.
to_restore = tf.Variable(tf.zeros([5])) # variables from your model.
print(to_restore.numpy()) # All zeros
fake_layer = tf.train.Checkpoint(bias=to_restore)
fake_net = tf.train.Checkpoint(l1=fake_layer)
new_root = tf.train.Checkpoint(net=fake_net)
status = new_root.restore(tf.train.latest_checkpoint('./tf_ckpts/'))
print(to_restore.numpy()) # We get the restored value now
To double check that it was restored you can type:
status.assert_existing_objects_matched()
and get the following output.
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f1d796da278>
Yes, it is possible if the checkpoint contains parameters of the model (parameters as W and b in W*x +b). I think you have that, in case of transfer learning, you can use this based on your files.
# Loads the weights
model.load_weights(checkpoint_path)
You should know the architecture of the model and create the model before using this. In some models, there is a specific way to load checkpoint.
Also, check this out: https://www.tensorflow.org/tutorials/keras/save_and_load

Tensorflow-Probability. - Saving and restoring checkpoints for bayesian neural network

I'd been looking at the Tensorflow Probability library and trying to modify the example in bayesian network example, hoping that I can save checkpoints and then restore them. I first started trying to use tf.train.Checkpoint but, although, I was not getting any error, neither when saving nor when restoring, didnt seem to restart the training from the previous checkpoint as the accuracy was completely different value.
I then tried using tf.keras.models.model.save, which again does save a file, but when trying to restore, I get the error: ValueError: Unknown layer: Conv2DFlipout when it is trying to deserialise the layer.
To be honest I dont know which way to go now, if somebody could point me to the right direction.
Thanks!
Giovanna
This is what I have so far to restore:
if FLAGS.architecture == "resnet":
model_fn = bayesian_resnet.bayesian_resnet
else:
model_fn = bayesian_vgg.bayesian_vgg
model = model_fn(
IMAGE_SHAPE,
num_classes=4,
kernel_posterior_scale_mean=FLAGS.kernel_posterior_scale_mean,
kernel_posterior_scale_constraint=FLAGS.kernel_posterior_scale_constraint)
print(images)
#check if saved checkpoint exists
exists = os.path.isfile(FLAGS.model_dir+"checkpoint.hdf5")
if exists:
model = tf.keras.models.load_model(FLAGS.model_dir+"checkpoint.hdf5")
logits = model(images)
labels_distribution = tfd.Categorical(logits=logits)
# Perform KL annealing. The optimal number of annealing steps
# depends on the dataset and architecture.
t = tf.Variable(0.0)
#kl_regularizer = t / (FLAGS.kl_annealing * len(x_train) / FLAGS.batch_size)
...

Tensorflow Combining Two Models End to End

In tensorflow it is fairly easy to load trained models back into tensorflow through the use of checkpoints. However, this use case seems oriented towards users that want to either run evaluation or additional training on a checkpointed model.
What is the simplest way in tensorflow to load a pre-trained model and use it (without training) to produce results which will then be used in a new model?
Right now the methods that seem most promising are tf.get_tensor_by_name() and tf.stop_gradient() in order to get the input and output tensors for the trained model loaded from tf.train.import_meta_graph().
What is the best practices setup for this sort of thing?
The most straightforward solution would be to freeze the pre-trained model variables using this function:
def freeze_graph(model_dir, output_node_names):
"""Extract the sub graph defined by the output nodes and convert
all its variables into constant
Args:
model_dir: the root folder containing the checkpoint state file
output_node_names: a string, containing all the output node's names,
comma separated
"""
if not tf.gfile.Exists(model_dir):
raise AssertionError(
"Export directory doesn't exist")
if not output_node_names:
print("You need to supply the name of the output node")
return -1
# We retrieve our checkpoint fullpath
checkpoint = tf.train.get_checkpoint_state(model_dir)
input_checkpoint = checkpoint.model_checkpoint_path
# We precise the file fullname of our freezed graph
absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
# We clear devices to allow TensorFlow to control on which device it will load operations
clear_devices = True
# We start a session using a temporary fresh Graph
with tf.Session(graph=tf.Graph()) as sess:
# We import the meta graph in the current default Graph
saver = tf.train.import_meta_graph(args.meta_graph_path, clear_devices=clear_devices)
# We restore the weights
saver.restore(sess, input_checkpoint)
# We use a built-in TF helper to export variables to constants
frozen_graph = tf.graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes
output_node_names.split(",") # The output node names are used to select the usefull nodes
)
return frozen_graph
Then you'd be able to build your new-model on top of the pre-trained model:
# Get the frozen graph
frozen_graph = freeze_graph(YOUR_MODEL_DIR, YOUR_OUTPUT_NODES)
# Set the frozen graph as a default graph
frozen_graph.as_default()
# Get the output tensor from the pre-trained model
pre_trained_model_result = frozen_graph.get_tensor_by_name(OUTPUT_TENSOR_NAME_OF_PRETRAINED_MODEL)
# Let's say you want to get the pre trained model result's square root
my_new_operation_results = tf.sqrt(pre_trained_model_result)

Saving tf.trainable_variables() using convert_variables_to_constants

I have a Keras model that I would like to convert to a Tensorflow protobuf (e.g. saved_model.pb).
This model comes from transfer learning on the vgg-19 network in which and the head was cut-off and trained with fully-connected+softmax layers while the rest of the vgg-19 network was frozen
I can load the model in Keras, and then use keras.backend.get_session() to run the model in tensorflow, generating the correct predictions:
frame = preprocess(cv2.imread("path/to/img.jpg")
keras_model = keras.models.load_model("path/to/keras/model.h5")
keras_prediction = keras_model.predict(frame)
print(keras_prediction)
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
tf_prediction = sess.run(output, {input_tensor: frame})
print(tf_prediction) # this matches keras_prediction exactly
If I don't include the line tvars = tf.trainable_variables(), then the tf_prediction variable is completely wrong and doesn't match the output from keras_prediction at all. In fact all the values in the output (single array with 4 probability values) are exactly the same (~0.25, all adding to 1). This made me suspect that weights for the head are just initialized to 0 if tf.trainable_variables() is not called first, which was confirmed after inspecting the model variables. In any case, calling tf.trainable_variables() causes the tensorflow prediction to be correct.
The problem is that when I try to save this model, the variables from tf.trainable_variables() don't actually get saved to the .pb file:
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), ['Softmax'])
graph_io.write_graph(constant_graph, './', 'saved_model.pb', as_text=False)
What I am asking is, how can I save a Keras model as a Tensorflow protobuf with the tf.training_variables() intact?
Thanks so much!
So your approach of freezing the variables in the graph (converting to constants), should work, but isn't necessary and is trickier than the other approaches. (more on this below). If your want graph freezing for some reason (e.g. exporting to a mobile device), I'd need more details to help debug, as I'm not sure what implicit stuff Keras is doing behind the scenes with your graph. However, if you want to just save and load a graph later, I can explain how to do that, (though no guarantees that whatever Keras is doing won't screw it up..., happy to help debug that).
So there are actually two formats at play here. One is the GraphDef, which is used for Checkpointing, as it does not contain metadata about inputs and outputs. The other is a MetaGraphDef which contains metadata and a graph def, the metadata being useful for prediction and running a ModelServer (from tensorflow/serving).
In either case you need to do more than just call graph_io.write_graph because the variables are usually stored outside the graphdef.
There are wrapper libraries for both these use cases. tf.train.Saver is primarily used for saving and restoring checkpoints.
However, since you want prediction, I would suggest using a tf.saved_model.builder.SavedModelBuilder to build a SavedModel binary. I've provided some boiler plate for this below:
from tensorflow.python.saved_model.signature_constants import DEFAULT_SERVING_SIGNATURE_DEF_KEY as DEFAULT_SIG_DEF
builder = tf.saved_model.builder.SavedModelBuilder('./mymodel')
with keras.backend.get_session() as sess:
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
sig_def = tf.saved_model.signature_def_utils.predict_signature_def(
{'input': input_tensor},
{'output': output}
)
builder.add_meta_graph_and_variables(
sess, tf.saved_model.tag_constants.SERVING,
signature_def_map={
DEFAULT_SIG_DEF: sig_def
}
)
builder.save()
After running this code you should have a mymodel/saved_model.pb file as well as a directory mymodel/variables/ with protobufs corresponding to the variable values.
Then to load the model again, simply use tf.saved_model.loader:
# Does Keras give you the ability to start with a fresh graph?
# If not you'll need to do this in a separate program to avoid
# conflicts with the old default graph
with tf.Session(graph=tf.Graph()):
meta_graph_def = tf.saved_model.loader.load(
sess,
tf.saved_model.tag_constants.SERVING,
'./mymodel'
)
# From this point variables and graph structure are restored
sig_def = meta_graph_def.signature_def[DEFAULT_SIG_DEF]
print(sess.run(sig_def.outputs['output'], feed_dict={sig_def.inputs['input']: frame}))
Obviously there's a more efficient prediction available with this code through tensorflow/serving, or Cloud ML Engine, but this should work.
It's possible that Keras is doing something under the hood which will interfere with this process as well, and if so we'd like to hear about it (and I'd like to make sure that Keras users are able to freeze graphs as well, so if you want to send me a gist with your full code or something maybe I can find someone who knows Keras well to help me debug.)
EDIT: You can find an end to end example of this here: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/keras/trainer/model.py#L85

TensorFlow: load checkpoint, but only parts of it (convolutional layers)

Is it possible to only load specific layers (convolutional layers) out of one checkpoint file?
I've trained some CNNs fully-supervised and saved my progress (I'm doing object localization). To do auto-labelling I thought of building a weakly-supervised CNNs out of my current model...but since the weakly-supervised version has different fully-connected layers, I would like to select only the convolutional filters of my TensorFlow checkpoint file.
Of course I could manually save the weights of the corresponding layers, but due to the fact that they're already included in TensorFlow's checkpoint file I would like to extract them there, in order to have one single storing file.
TensorFlow 2.1 has many different public facilities for loading checkpoints (model.save, Checkpoint, saved_model, etc), but to the best of my knowledge, none of them has filtering API. So, let me suggest a snippet for hard cases which uses tooling from the TF2.1 internal development tests.
checkpoint_filename = '/path/to/our/weird/checkpoint.ckpt'
model = tf.keras.Model( ... ) # TF2.0 Model to initialize with the above checkpoint
variables_to_load = [ ... ] # List of model weight names to update.
from tensorflow.python.training.checkpoint_utils import load_checkpoint, list_variables
reader = load_checkpoint(checkpoint_filename)
for w in model.weights:
name=w.name.split(':')[0] # See (b/29227106)
if name in variables_to_load:
print(f"Updating {name}")
w.assign(reader.get_tensor(
# (Optional) Handle variable renaming
{'/var_name1/in/model':'/var_name1/in/checkpoint',
'/var_name2/in/model':'/var_name2/in/checkpoint',
# ... and so on
}.get(name,name)))
Note: model.weights and list_variables may help to inspect variables in Model and in the checkpoint
Note also, that this method will not restore model's optimizer state.