Tensorflow-Probability. - Saving and restoring checkpoints for bayesian neural network - tensorflow

I'd been looking at the Tensorflow Probability library and trying to modify the example in bayesian network example, hoping that I can save checkpoints and then restore them. I first started trying to use tf.train.Checkpoint but, although, I was not getting any error, neither when saving nor when restoring, didnt seem to restart the training from the previous checkpoint as the accuracy was completely different value.
I then tried using tf.keras.models.model.save, which again does save a file, but when trying to restore, I get the error: ValueError: Unknown layer: Conv2DFlipout when it is trying to deserialise the layer.
To be honest I dont know which way to go now, if somebody could point me to the right direction.
Thanks!
Giovanna
This is what I have so far to restore:
if FLAGS.architecture == "resnet":
model_fn = bayesian_resnet.bayesian_resnet
else:
model_fn = bayesian_vgg.bayesian_vgg
model = model_fn(
IMAGE_SHAPE,
num_classes=4,
kernel_posterior_scale_mean=FLAGS.kernel_posterior_scale_mean,
kernel_posterior_scale_constraint=FLAGS.kernel_posterior_scale_constraint)
print(images)
#check if saved checkpoint exists
exists = os.path.isfile(FLAGS.model_dir+"checkpoint.hdf5")
if exists:
model = tf.keras.models.load_model(FLAGS.model_dir+"checkpoint.hdf5")
logits = model(images)
labels_distribution = tfd.Categorical(logits=logits)
# Perform KL annealing. The optimal number of annealing steps
# depends on the dataset and architecture.
t = tf.Variable(0.0)
#kl_regularizer = t / (FLAGS.kl_annealing * len(x_train) / FLAGS.batch_size)
...

Related

Cannot load model weights in TensorFlow 2

I cannot load model weights after saving them in TensorFlow 2.2. Weights appear to be saved correctly (I think), however, I fail to load the pre-trained model.
My current code is:
segmentor = sequential_model_1()
discriminator = sequential_model_2()
def save_model(ckp_dir):
# create directory, if it does not exist:
utils.safe_mkdir(ckp_dir)
# save weights
segmentor.save_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'))
discriminator.save_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'))
def load_pretrained_model(ckp_dir):
try:
segmentor.load_weights(os.path.join(ckp_dir, 'checkpoint-segmentor'), skip_mismatch=True)
discriminator.load_weights(os.path.join(ckp_dir, 'checkpoint-discriminator'), skip_mismatch=True)
print('Loading pre-trained model from: {0}'.format(ckp_dir))
except ValueError:
print('No pre-trained model available.')
Then I have the training loop:
# training loop:
for epoch in range(num_epochs):
for image, label in dataset:
train_step()
# save best model I find during training:
if this_is_the_best_model_on_validation_set():
save_model(ckp_dir='logs_dir')
And then, at the end of the training "for loop", I want to load the best model and do a test with it. Hence, I run:
# load saved model and do a test:
load_pretrained_model(ckp_dir='logs_dir')
test()
However, this results in a ValueError. I checked the directory where the weights should be saved, and there they are!
Any idea what is wrong with my code? Am I loading the weights incorrectly?
Thank you!
Ok here is your problem - the try-except block you have is obscuring the real issue. Removing it gives the ValueError:
ValueError: When calling model.load_weights, skip_mismatch can only be set to True when by_name is True.
There are two ways to mitigate this - you can either call load_weights with by_name=True, or remove skip_mismatch=True depending on your needs. Either case works for me when testing your code.
Another consideration is that you when you store both the discriminator and segmentor checkpoints to the log directory, you overwrite the checkpoint file each time. This contains two strings that give the path to the specific model checkpoint files. Since you save discriminator second, every time this file will say discriminator with no reference to segmentor. You can mitigate this by storing each model in two subdirectories in the log directory instead, i.e.
logs_dir/
+ discriminator/
+ checkpoint
+ ...
+ segmentor/
+ checkpoint
+ ...
Although in the current state your code would work in this case.

Saving tf.trainable_variables() using convert_variables_to_constants

I have a Keras model that I would like to convert to a Tensorflow protobuf (e.g. saved_model.pb).
This model comes from transfer learning on the vgg-19 network in which and the head was cut-off and trained with fully-connected+softmax layers while the rest of the vgg-19 network was frozen
I can load the model in Keras, and then use keras.backend.get_session() to run the model in tensorflow, generating the correct predictions:
frame = preprocess(cv2.imread("path/to/img.jpg")
keras_model = keras.models.load_model("path/to/keras/model.h5")
keras_prediction = keras_model.predict(frame)
print(keras_prediction)
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
tf_prediction = sess.run(output, {input_tensor: frame})
print(tf_prediction) # this matches keras_prediction exactly
If I don't include the line tvars = tf.trainable_variables(), then the tf_prediction variable is completely wrong and doesn't match the output from keras_prediction at all. In fact all the values in the output (single array with 4 probability values) are exactly the same (~0.25, all adding to 1). This made me suspect that weights for the head are just initialized to 0 if tf.trainable_variables() is not called first, which was confirmed after inspecting the model variables. In any case, calling tf.trainable_variables() causes the tensorflow prediction to be correct.
The problem is that when I try to save this model, the variables from tf.trainable_variables() don't actually get saved to the .pb file:
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), ['Softmax'])
graph_io.write_graph(constant_graph, './', 'saved_model.pb', as_text=False)
What I am asking is, how can I save a Keras model as a Tensorflow protobuf with the tf.training_variables() intact?
Thanks so much!
So your approach of freezing the variables in the graph (converting to constants), should work, but isn't necessary and is trickier than the other approaches. (more on this below). If your want graph freezing for some reason (e.g. exporting to a mobile device), I'd need more details to help debug, as I'm not sure what implicit stuff Keras is doing behind the scenes with your graph. However, if you want to just save and load a graph later, I can explain how to do that, (though no guarantees that whatever Keras is doing won't screw it up..., happy to help debug that).
So there are actually two formats at play here. One is the GraphDef, which is used for Checkpointing, as it does not contain metadata about inputs and outputs. The other is a MetaGraphDef which contains metadata and a graph def, the metadata being useful for prediction and running a ModelServer (from tensorflow/serving).
In either case you need to do more than just call graph_io.write_graph because the variables are usually stored outside the graphdef.
There are wrapper libraries for both these use cases. tf.train.Saver is primarily used for saving and restoring checkpoints.
However, since you want prediction, I would suggest using a tf.saved_model.builder.SavedModelBuilder to build a SavedModel binary. I've provided some boiler plate for this below:
from tensorflow.python.saved_model.signature_constants import DEFAULT_SERVING_SIGNATURE_DEF_KEY as DEFAULT_SIG_DEF
builder = tf.saved_model.builder.SavedModelBuilder('./mymodel')
with keras.backend.get_session() as sess:
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
sig_def = tf.saved_model.signature_def_utils.predict_signature_def(
{'input': input_tensor},
{'output': output}
)
builder.add_meta_graph_and_variables(
sess, tf.saved_model.tag_constants.SERVING,
signature_def_map={
DEFAULT_SIG_DEF: sig_def
}
)
builder.save()
After running this code you should have a mymodel/saved_model.pb file as well as a directory mymodel/variables/ with protobufs corresponding to the variable values.
Then to load the model again, simply use tf.saved_model.loader:
# Does Keras give you the ability to start with a fresh graph?
# If not you'll need to do this in a separate program to avoid
# conflicts with the old default graph
with tf.Session(graph=tf.Graph()):
meta_graph_def = tf.saved_model.loader.load(
sess,
tf.saved_model.tag_constants.SERVING,
'./mymodel'
)
# From this point variables and graph structure are restored
sig_def = meta_graph_def.signature_def[DEFAULT_SIG_DEF]
print(sess.run(sig_def.outputs['output'], feed_dict={sig_def.inputs['input']: frame}))
Obviously there's a more efficient prediction available with this code through tensorflow/serving, or Cloud ML Engine, but this should work.
It's possible that Keras is doing something under the hood which will interfere with this process as well, and if so we'd like to hear about it (and I'd like to make sure that Keras users are able to freeze graphs as well, so if you want to send me a gist with your full code or something maybe I can find someone who knows Keras well to help me debug.)
EDIT: You can find an end to end example of this here: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/keras/trainer/model.py#L85

Tensorflow load pre-trained model use different optimizer

I want to load a pre-trained model (optimized by AdadeltaOptimizer) and continue training with SGD (GradientDescentOptimizer). The models are saved and loaded with tensorlayer API:
save model:
import tensorlayer as tl
tl.files.save_npz(network.all_params,
name=model_dir + "model-%d.npz" % global_step)
load model:
load_params = tl.files.load_npz(path=resume_dir + '/', name=model_name)
tl.files.assign_params(sess, load_params, network)
If I continue training with adadelta, the training loss (cross entropy) looks normal (start at a close value as the loaded model). However, if I change the optimizer to SGD, the training loss would be as large as a newly initialized model.
I took a look at the model-xxx.npz file from tl.files.save_npz. It only saves all model parameters as ndarray. I'm not sure how the optimizer or learning rate is involved here.
You probably would have to import the tensor into a variable which is the loss function/cross-entropy that feeds into your Adam Optimizer previously. Now, just feed it through your SGD optimizer instead.
saver = tf.train.import_meta_graph('filename.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
cross_entropy = graph.get_tensor_by_name("entropy:0") #Tensor to import
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
In this case, I have tagged the cross-entropy Tensor before training my pre-train model with the name entropy, as such
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv), name = 'entropy')
If you are unable to make changes to your pretrain model, you can obtain the list of Tensors in your model(after you have imported it) from graph and deduce which Tensor you require. I have no experience with Tensorlayer, so this guide is to provide more of an understanding. You can take a look at Tensorlayer-Layers, they should explain how to obtain your Tensor. As Tensorlayer is built on top of Tensorflow, most of the functions should still be available.
You can specify the parameters you want to save in your checkpoint file.
save_npz([save_list, name, sess])
In the save_list you're specifying only the network parameters that don't contain the optimizer parameters, thus no learning rate or any other optimizer parameters.
If you want to save the current learning rate (in order to use the same exact learning rate when you restore the model) you have to add it to the save_list, like that:
save_npz(network.all_params.extend([learning_rate])
(I suppoose that all_params is an array, I guess my supposition is correct.
Since you want to change the optimizer, I suggest you save the learning_rate only as optimizer parameter and not any other variable that the optimizer creates.
In that way, you'll be able to change the optimizer and restoring the model, otherwise (if you put in your checkpoint any other variable) the graph you'll try to restore won't find the variables in which place the saved value and you won't be able to change it.
https://tensorlayer.readthedocs.io/en/latest/user/get_start_advance.html#pre-trained-cnn
vgg = tl.models.vgg16(pretrained=True)
img = tl.vis.read_image('data/tiger.jpeg')
img = tl.prepro.imresize(img, (224, 224)).astype(np.float32) / 255
output = vgg(img, is_train=False)
For 2.0 version, use this

Use inception v3 with batch of images in tensorflow

In one of my project of computer vision, I use public pre-trained inception-v3 available here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz. This network is at the beginning of my classification chain (a lot of stuff is performed on logits produced by the network). I would like to feed this network with a batch of images (instead of sequentially processing images) in order to make it faster.
However, the provided network had been "frozen", and it can only process one image at a time.
Is there any solution to "unfreeze" a graph and adapt it so that I can use it on batch of images?
(N.B : I found related topics on the internet, but they all suggest to take a more recent network available for instance here :
http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz. This is not what I would like to do since a lot of stuff has been tuned on the output of the frozen model.)
Not sure if this is too late, but here is the code snippet that I used:
# First loan model into and old graph
proto_file = ... # downloaded inception protofile
graph_def = tf.GraphDef.FromString(open(proto_file, 'rb').read())
to_delete = {“DecodeJpeg", "Cast", "ExpandDims", "pool_3/_reshape", "softmax"}
graph_def = delete_ops_from_graph(graph_def, to_delete)
new_graph = tf.Graph()
with new_graph.as_default():
x = tf.placeholder(tf.uint8, [None, None, None, 3], name="batched_inputs")
x_cast = tf.cast(x, dtype=tf.float32)
y = tf.import_graph_def(graph_def, input_map={"ExpandDims:0": x_cast}, return_elements=["pool_3:0"],name="")
...
Now new_graph is the graph that has batch dimension (takes in 4d tensor NHWC). Note that this is good if you want to use inception-2015-12-05.tgz as a feature extractor. You would need to take the output from output = new_graph.get_tensor_by_name("pool_3:0")
For the definition of delete_ops_from_graph, see Tensorflow: delete nodes from graph

serving a classification model in tensorflow

I am trying to build a classifier in tensorflow, I have written the model and tested it to see it work but would like to build it for production,
but cant seem to find a way to pass input into the model
this is my code i used to train and test
#training the neural netork
def get_train_inputs():
x = tf.constant(tr_features)
y = tf.constant(tr_labels)
return x, y
# fit the model using 1000 training steps
classifier.fit(input_fn=get_train_inputs, steps=1000)
#testing the neural network
def get_test_inputs():
x = tf.constant(ts_features)
y = tf.constant(ts_labels)
return x, y
#Calculate accuracy
accuracy_score = classifier.evaluate(input_fn=get_test_inputs, steps=1000)["accuracy"]
print('Test accuracy : ', format(accuracy_score))
i have tested it by passing test data to the predict_classes function and that works, my question is how do i build a builder for this, so i can pass in data from external application?
#test to test prediction
def new_sample():
return np.array(testing,dtype=np.float32)
predictions = list(classifier.predict(input_fn=new_sample))
print('predition : ', format(predictions))
I would to suggest to do an implementation much like
ex.
This is fairly similar to what you want to do. Though you would need to implement checkpoints (Crtl+f checkpoint file). Much of the code there is specific to his/her program, but the gist of it is to create a file "eval.py" which runs a net off of a previous saved checkpoint. In the eval.py file, you can input whatever file you wish with your input.
The simplest way is TensorflowServing