Error when implementing tensorflow high level api - tensorflow

I am trying to implement tensorflows provided high level api's, specifically the baseline classifier. However when trying to train the model, I get the following
Error:
NotFoundError (see above for traceback): Key baseline/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Code:
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
def digit_cross():
# Number of classes, one class for each of 10 digits.
num_classes = 10
digit = datasets.load_digits()
x = digit.data
y = digit.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.3, random_state=42)
y_train_index = np.arange(y_train.size)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(x_train)},
y=np.array(y_train),
num_epochs=None,
shuffle=False)
# Build BaselineClassifier
classifier = tf.estimator.BaselineClassifier(n_classes=num_classes,
model_dir="./checkpoints_tutorial17-1/")
# Fit model.
classifier.train(train_input_fn)
digit_cross()

It seems that you have a checkpoint in model_dir="./checkpoints_tutorial17-1/", which is from another model and is not from a BaselineClassifier. To be specific, you have a checkpoint file and model.ckpt-* files in that folder.
As tensorflow documented:
model_dir: Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model. If PathLike object, the path will be resolved. If None, the model_dir in config will be used if set. If both are set, they must be same. If both are None, a temporary directory will be used.
Here, BaselineClassifier will first build a graph which uses baseline/bias. Then it finds out that there is a previous checkpoint in model_dir. It will try to load this checkpoint and you should see an info (if you've done tf.logging.set_verbosity(tf.logging.INFO)) saying something like
"INFO:tensorflow:Restoring parameters from .../checkpoints_tutorial17-1\model.ckpt-..."
Because this checkpoint in model_dir is not from a BaselineClassifier, it won't have baseline/bias. BaselineClassifier cannot find it and will thus throw an error.

Related

What parameters can be used in filepath for ModelCheckpoints (tf.Keras)?

I'm trying to train a Keras model and save model weighta at every epoch and patch.
I define a checkpoint as follows:
checkpoint_path='model_checkpoints_5000/checkpoints_{epoch:02d}_{batch:04d}'
checkpoint = ModelCheckpoint(filepath = checkpoint_5000_path,frequency = 5000)
and train the model:
model.fit(x=x_train, y=y_train, epochs=3, validation_data=(x_test, y_test),
batch_size=10, callbacks=[checkpoint])
But right after the forst iteration the error occurs:
KeyError: 'Failed to format this callback filepath: "model_checkpoints_5000/checkpoints_{epoch:02d}_{batch:04d}". Reason: \'batch\'
How can I have python add the batch number to the file name?
How to fide the list of other parameters that are available for output?
My setup: Windows 10, jupyter notebook in chrome, Python 3.5.4, Tensorflow 2.3.0, Keras is imported from Tensorflow.

Import Weights from Keras Classifier into TF Object Detection API

I have a classifier that I trained using keras that is working really well. It uses keras.applications.MobileNetV2.
This classifier is well trained on around 200 categories, and has a high accuracy.
However, I would like to use the feature extraction layers from this classifier as part of an object detection model.
I have been using the Tensorflow Object Detection API, and looking into the SSDLite+MobileNetV2 model. I can start to run training, but the training is very slow and the bulk of the loss comes from the classification stage.
What I would like to do is assign the weights from my keras .h5 model to the Feature Extraction layer of MobileNetV2 in Tensorflow, but I'm not sure of the best way to do that.
I can load the h5 file easily, and get a list of layer names:
import keras
keras_model = keras.models.load_model("my_classifier.h5")
keras_names = [l.name for l in keras_model.layers]
print(keras_names)
I can also restore the tensorflow checkpoint from the object detection API and export the layers with weights:
tf.reset_default_graph()
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('models/model.ckpt.meta')
what = new_saver.restore(sess, 'models/model.ckpt')
tf_names = []
for op in sess.graph.get_operations():
if "MobilenetV2" in op.name and "Assign" in op.name:
tf_names.append(op.name)
print(tf_names)
I cannot seem to get a good match-up between layer names from keras and from tensorflow. Even if I could I'm not sure of the next steps.
If anyone could give me some advice about the best way to approach this I would be very grateful.
Update:
I followed Sharky's suggestion below, with a slight modification:
new_saver = tf.train.import_meta_graph(os.path.join(keras_checkpoint_dir, 'keras_model.ckpt.meta'))
new_saver.restore(sess, os.path.join(keras_checkpoint_dir, tf.train.latest_checkpoint(keras_checkpoint_dir)))
However unfortunately I now get this error:
NotFoundError (see above for traceback): Restoring from checkpoint
failed. This is most likely due to a Variable name or other graph key
that is missing from the checkpoint. Please ensure that you have not
altered the graph expected based on the checkpoint. Original error:
Key
FeatureExtractor/MobilenetV2/expanded_conv_6/project/BatchNorm/gamma
not found in checkpoint [[node save/RestoreV2_295 (defined at
:7) = RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0,
save/RestoreV2_295/tensor_names,
save/RestoreV2_295/shape_and_slices)]] [[{{node
save/RestoreV2_196/_393}} = _Recvclient_terminated=false,
recv_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device_incarnation=1, tensor_name="edge_789_save/RestoreV2_196",
tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Any ideas on how to get rid of this error?
You can use tf.keras.estimator.model_to_estimator
estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir=path)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, os.path.join(path/keras, tf.train.latest_checkpoint(path/keras)))
print(tf.global_variables())
This should do the job. Note that it will create a subdirectory inside originally specified path.

Can't import frozen graph with BatchNorm layer

I have trained a Keras model based on this repo.
After the training I save the model as checkpoint files like this:
sess=tf.keras.backend.get_session()
saver = tf.train.Saver()
saver.save(sess, current_run_path + '/checkpoint_files/model_{}.ckpt'.format(date))
Then I restore the graph from the checkpoint files and freeze it using the standard tf freeze_graph script. When I want to restore the frozen graph I get the following error:
Input 0 of node Conv_BN_1/cond/ReadVariableOp/Switch was passed float from Conv_BN_1/gamma:0 incompatible with expected resource
How can I fix this issue?
Edit: My problem is related to this question. Unfortunately, I can't use the workaround.
Edit 2:
I have opened an issue on github and created a gist to reproduce the error.
https://github.com/keras-team/keras/issues/11032
Just resolved the same issue. I connected this few answers: 1, 2, 3 and realized that issue originated from batchnorm layer working state: training or learning. So, in order to resolve that issue you just need to place one line before loading your model:
keras.backend.set_learning_phase(0)
Complete example, to export model
import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.keras.applications.inception_v3 import InceptionV3
def freeze_graph(graph, session, output):
with graph.as_default():
graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
graph_io.write_graph(graphdef_frozen, ".", "frozen_model.pb", as_text=False)
tf.keras.backend.set_learning_phase(0) # this line most important
base_model = InceptionV3()
session = tf.keras.backend.get_session()
INPUT_NODE = base_model.inputs[0].op.name
OUTPUT_NODE = base_model.outputs[0].op.name
freeze_graph(session.graph, session, [out.op.name for out in base_model.outputs])
to load *.pb model:
from PIL import Image
import numpy as np
import tensorflow as tf
# https://i.imgur.com/tvOB18o.jpg
im = Image.open("/home/chichivica/Pictures/eagle.jpg").resize((299, 299), Image.BICUBIC)
im = np.array(im) / 255.0
im = im[None, ...]
graph_def = tf.GraphDef()
with tf.gfile.GFile("frozen_model.pb", "rb") as f:
graph_def.ParseFromString(f.read())
graph = tf.Graph()
with graph.as_default():
net_inp, net_out = tf.import_graph_def(
graph_def, return_elements=["input_1", "predictions/Softmax"]
)
with tf.Session(graph=graph) as sess:
out = sess.run(net_out.outputs[0], feed_dict={net_inp.outputs[0]: im})
print(np.argmax(out))
This is bug with Tensorflow 1.1x and as another answer stated, it is because of the internal batch norm learning vs inference state. In TF 1.14.0 you actually get a cryptic error when trying to freeze a batch norm layer.
Using set_learning_phase(0) will put the batch norm layer (and probably others like dropout) into inference mode and thus the batch norm layer will not work during training, leading to reduced accuracy.
My solution is this:
Create the model using a function (do not use K.set_learning_phase(0)):
def create_model():
inputs = Input(...)
...
return model
model = create_model()
Train model
Save weights:
model.save_weights("weights.h5")
Clear session (important so layer names are the same) and set learning phase to 0:
K.clear_session()
K.set_learning_phase(0)
Recreate model and load weights:
model = create_model()
model.load_weights("weights.h5")
Freeze as before
Thanks for pointing the main issue! I found that keras.backend.set_learning_phase(0) to be not working sometimes, at least in my case.
Another approach might be: for l in keras_model.layers: l.trainable = False

How can I convert a trained Tensorflow model to Keras?

I have a trained Tensorflow model and weights vector which have been exported to protobuf and weights files respectively.
How can I convert these to JSON or YAML and HDF5 files which can be used by Keras?
I have the code for the Tensorflow model, so it would also be acceptable to convert the tf.Session to a keras model and save that in code.
I think the callback in keras is also a solution.
The ckpt file can be saved by TF with:
saver = tf.train.Saver()
saver.save(sess, checkpoint_name)
and to load checkpoint in Keras, you need a callback class as follow:
class RestoreCkptCallback(keras.callbacks.Callback):
def __init__(self, pretrained_file):
self.pretrained_file = pretrained_file
self.sess = keras.backend.get_session()
self.saver = tf.train.Saver()
def on_train_begin(self, logs=None):
if self.pretrian_model_path:
self.saver.restore(self.sess, self.pretrian_model_path)
print('load weights: OK.')
Then in your keras script:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
restore_ckpt_callback = RestoreCkptCallback(pretrian_model_path='./XXXX.ckpt')
model.fit(x_train, y_train, batch_size=128, epochs=20, callbacks=[restore_ckpt_callback])
That will be fine.
I think it is easy to implement and hope it helps.
Francois Chollet, the creator of keras, stated in 04/2017 "you cannot turn an arbitrary TensorFlow checkpoint into a Keras model. What you can do, however, is build an equivalent Keras model then load into this Keras model the weights"
, see https://github.com/keras-team/keras/issues/5273 . To my knowledge this hasn't changed.
A small example:
First, you can extract the weights of a tensorflow checkpoint like this
PATH_REL_META = r'checkpoint1.meta'
# start tensorflow session
with tf.Session() as sess:
# import graph
saver = tf.train.import_meta_graph(PATH_REL_META)
# load weights for graph
saver.restore(sess, PATH_REL_META[:-5])
# get all global variables (including model variables)
vars_global = tf.global_variables()
# get their name and value and put them into dictionary
sess.as_default()
model_vars = {}
for var in vars_global:
try:
model_vars[var.name] = var.eval()
except:
print("For var={}, an exception occurred".format(var.name))
It might also be of use to export the tensorflow model for use in tensorboard, see https://stackoverflow.com/a/43569991/2135504
Second, you build you keras model as usually and finalize it by "model.compile". Pay attention that you need to give you define each layer by name and add it to the model after that, e.g.
layer_1 = keras.layers.Conv2D(6, (7,7), activation='relu', input_shape=(48,48,1))
net.add(layer_1)
...
net.compile(...)
Third, you can set the weights with the tensorflow values, e.g.
layer_1.set_weights([model_vars['conv7x7x1_1/kernel:0'], model_vars['conv7x7x1_1/bias:0']])
Currently, there is no direct in-built support in Tensorflow or Keras to convert the frozen model or the checkpoint file to hdf5 format.
But since you have mentioned that you have the code of Tensorflow model, you will have to rewrite that model's code in Keras. Then, you will have to read the values of your variables from the checkpoint file and assign it to Keras model using layer.load_weights(weights) method.
More than this methodology, I would suggest to you to do the training directly in Keras as it claimed that Keras' optimizers are 5-10% times faster than Tensorflow's optimizers. Other way is to write your code in Tensorflow with tf.contrib.keras module and save the file directly in hdf5 format.
Unsure if this is what you are looking for, but I happened to just do the same with the newly released keras support in TF 1.2. You can find more on the API here: https://www.tensorflow.org/api_docs/python/tf/contrib/keras
To save you a little time, I also found that I had to include keras modules as shown below with the additional python.keras appended to what is shown in the API docs.
from tensorflow.contrib.keras.python.keras.models import Sequential
Hope that helps get you where you want to go. Essentially once integrated in, you then just handle your model/weight export as usual.

Keras: Load checkpoint weights HDF5 generated by multiple GPUs

Checkpoint snippet:
checkpointer = ModelCheckpoint(filepath=os.path.join(savedir, "mid/weights.{epoch:02d}.hd5"), monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False)
hist = model.fit_generator(
gen.generate(batch_size = batch_size, nb_classes=nb_classes), samples_per_epoch=593920, nb_epoch=nb_epoch, verbose=1, callbacks=[checkpointer], validation_data = gen.vld_generate(VLD_PATH, batch_size = 64, nb_classes=nb_classes), nb_val_samples=10000
)
I trained my model on a multiple GPU host which dumps mid files in HDF5 format. When I loaded them on a single GPU machine with keras.load_weights('mid'), an error was raised:
Using TensorFlow backend.
Traceback (most recent call last):
File "server.py", line 171, in <module>
model = load_model_and_weights('zhch.yml', '7_weights.52.hd5')
File "server.py", line 16, in load_model_and_weights
model.load_weights(os.path.join('model', weights_name))
File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2701, in load_weights
self.load_weights_from_hdf5_group(f)
File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2753, in load_weights_from_hdf5_group
str(len(flattened_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 1 layers into a model with 21 layers.
Is there any way to load checkpoint weights generated by multiple GPUs on a single GPU machine? It seems that no issue of Keras discussed this problem thus any help would be appreciated.
You can load your model on a single GPU like this:
from keras.models import load_model
multi_gpus_model = load_model('mid')
origin_model = multi_gpus_model.layers[-2] # you can use multi_gpus_model.summary() to see the layer of the original model
origin_model.save_weights('single_gpu_model.hdf5')
'single_gpu_model.hdf5' is the file that you can load to the single GPU machine model.
Try this function:
def keras_model_reassign_weights(model_cpu,model_gpu):
weights_temp ={}
print('_'*5,'Collecting weights from GPU model','_'*5)
for layer in model_gpu.layers:
try:
for layer_unw in layer.layers:
#print('Weights extracted for: ',layer_unw.name)
weights_temp[layer_unw.name] = layer_unw.get_weights()
break
except:
print('Skipped: ',layer.name)
print('_'*5,'Writing weights to CPU model','_'*5)
for layer in model_cpu.layers:
try:
layer.set_weights(weights_temp[layer.name])
#print(layer.name,'Done!')
except:
print(layer.name,'weights does not set for this layer!')
return model_cpu
But you need to load weights to your gpu model first:
#load or initialize your keras multi-gpu model
model_gpu = None
#load or initialize your keras model with the same structure, without using keras.multi_gpu function
model_cpu = None
#load weights into multigpu model
model_gpu.load_weights(r'gpu_model_best_checkpoint.hdf5')
#execute function
model_cpu = keras_model_reassign_weights(model_cpu,model_gpu)
#save obtained weights for cpu model
model_cpu.save_weights(r'CPU_model.hdf5')
After transferring you can use weights with a single GPU or CPU model.