Creating a Slim classifier using pretrained ResNet V2 model - tensorflow

I am trying to create an image classifier that utilizes the pre-trained ResNet V2 model provided in the slim documentation.
Here is the code so far:
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['carrot.jpg']
input_tensor = tf.placeholder(tf.float32, shape=(None,299,299,3), name='input_image')
scaled_input_tensor = tf.scalar_mul((1.0/255), input_tensor)
scaled_input_tensor = tf.subtract(scaled_input_tensor, 0.5)
scaled_input_tensor = tf.multiply(scaled_input_tensor, 2.0)
variables_to_restore = slim.get_model_variables()
print(variables_to_restore)
init_fn = slim.assign_from_checkpoint_fn(
checkpoint_file,
slim.get_model_variables('InceptionResnetV2'))
sess = tf.Session()
init_fn(sess)
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(scaled_input_tensor, is_training=False)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
The problem is I keep getting this error:
Traceback (most recent call last):
File "./classify.py", line 21, in <module>
slim.get_model_variables('InceptionResnetV2'))
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 584, in assign_from_checkpoint_fn
saver = tf_saver.Saver(var_list, reshape=reshape_variables)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
self.build()
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1061, in build
raise ValueError("No variables to save")
ValueError: No variables to save
So it seems TF/Slim is unable to find any variables and this is made clear when I call:
variables_to_restore = slim.get_model_variables()
print(variables_to_restore)
As it outputs an empty array.
How can I go about using the pre-trained model?

This happens because you haven't constructed the model in your graph yet to have any variables starting with the name "InceptionResnetV2" to be captured and restored by the saver.
I believe you should put the model construction before using slim.get_variables_to_restore().
For instance:
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(scaled_input_tensor, is_training=False)
variables_to_restore = slim.get_model_variables()
This way, the Tensor variables will be constructed and you should see variables_to_restore is no longer empty.

You need to manually add the model variables.
Try this
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(scaled_input_tensor, is_training=False)
# Add model variables
for var in tf.global_variables(scope='inception_resnet_v2'):
slim.add_model_variable(var)

Related

Keras model compiles well outside SageMaker. But as soon as i try to train it in SageMaker with the Tensorflow instance i get an error

Here is the error: ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata)
I try to train and deploy a multi-input Keras model with AWS Sagemaker, but there seem to be some showstopper issues with the needed libraries that expect single input for Keras models.
I have 3 categorical inputs variables and one numeric variable. The target variable is also of type categorical.I have no test or validation data. I am only interested in the training without errors.
I merged the arrays after data preparation as follows and then stored them in s3
input_train = np.column_stack((input_cat1, input_cat2, input_num, input_cat3))
training_input_path = sage_maker_session.upload_data('data/training.npz', key_prefix=prefix + training_folder)
print(training_input_path)
s3://sagemaker-eu-central-1-xxxxxxxxxxxxx/user_tracking/training/training.npz
In the train.py script (entry_point), I again fetched the file from s3. And I compiled the Train.py file again without problems, as if I were outside SageMaker.
%%writefile train.py
### import library ###
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=60)
parser.add_argument('--batch-size', type=int, default=50)
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
#parser.add_argument('--model-dir', type=str)
parser.add_argument('--training', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
#parser.add_argument('--training', type=str, default='data')
args, _ = parser.parse_known_args()
epochs = args.epochs
batch_size = args.batch_size
model_dir = args.model_dir
training_dir = args.training
input_train =np.load(os.path.join(training_dir, 'training.npz'))['train_input']
target =np.load(os.path.join(training_dir, 'training.npz'))['train_output']
input_cat1 = input_train[:,0].astype(np.int32)
input_cat2 = input_train[:,1].astype(np.int32)
input_cat3 = input_train[:,3:].astype(np.int32)
input_num = input_train[:,2].astype(np.float32)
n_steps = 2 # number of timesteps in each sample
num_unique_os = 5 #len(le_betriebsystem.classes_)+1
num_unique_browser = 10 #len(le_browser.classes_)+1
num_unique_actions = 210 #len(le_actionen.classes_)+1
#numeric Input
numerical_input = tf.keras.Input(shape=(1,), name='numeric_input')
#categorical Input
os_input = tf.keras.Input(shape=(1,), name='os_input')
browser_input = tf.keras.Input(shape=(1,), name='browser_input')
action_input= tf.keras.Input(shape=(max_seq_len,), name='action_input')
emb_os = tf.keras.layers.Embedding(num_unique_os, 32)(os_input)
emb_browser = tf.keras.layers.Embedding(num_unique_browser, 32)(browser_input)
emb_actions = tf.keras.layers.Embedding(num_unique_actions, 64)(action_input)
actions_repr = tf.keras.layers.LSTM(300, return_sequences=True)(emb_actions)
actions_repr = tf.keras.layers.LSTM(200)(emb_actions)
emb_os = tf.squeeze(emb_os, axis=1)
emb_browser = tf.squeeze(emb_browser, axis=1)
activity_repr = tf.keras.layers.Concatenate()([emb_os, emb_browser, actions_repr,
numerical_input])
x = tf.keras.layers.RepeatVector(n_steps)(activity_repr)
x = tf.keras.layers.LSTM(288, return_sequences=True)(x)
next_n_actions = tf.keras.layers.Dense(num_unique_actions-1, activation='softmax')(x)
model = tf.keras.Model(inputs=[numerical_input, os_input, browser_input, action_input], outputs =
next_n_actions)
model.summary()
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])
history = model.fit({'numeric_input': input_num,
'os_input': input_cat1,
'browser_input': input_cat2,
'action_input': input_cat3}, target, batch_size=50, epochs=130)
tf.saved_model.simple_save(
tf.keras.backend.get_session(),
os.path.join(model_dir, '1'),
inputs={'inputs': model.input},
outputs={t.name: t for t in model.outputs})
I received this:
Model Sommary
Meric Tendency
when trying to do the whole thing again with the Tensorflow instance, the following error occurred:
Traceback (most recent call last): File "train.py", line 105, in model = tf.keras.Model(inputs=[numerical_input, os_input, browser_input, action_input], outputs = next_n_actions) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 121, in init super(Model, self).init(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py", line 80, in init self._init_graph_network(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpointable/base.py", line 474, in _method_wrapper method(self, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py", line 224, in _init_graph_network '(thus holding past layer metadata). Found: ' + str(x)) ValueError: Output tensors to a Model must be the output of a TensorFlow Layer (thus holding past layer metadata). Found: Tensor("dense/truediv:0", shape=(?, 2, 209), dtype=float32) 2021-03-08 21:52:04,761 sagemaker-containers ERROR ExecuteUserScriptError: Command "/usr/bin/python train.py --batch-size 50 --epochs 150--model_dir s3://sagemaker-eu-central-1-xxxxxxxxxxxxxxxxx/sagemaker-tensorflow-scriptmode
I used the Tensorflow versions '2.0.4' and '1.15.4' respectly with the kernels conda_tensorflow_p36 and conda_tensorflow2_p36
For more of Code: https://gitlab.com/patricksardin08/data-science/-/tree/master/
Please i need your Helps. I'm here around the clock if anyone wants me to explain the question in more detail.

Tensorflow cannot quantize reshape function

I am going to train my model quantization aware. However, when i use it , the tensorflow_model_optimization cannot quantize tf.reshape function , and throws an error.
tensorflow version : '2.4.0-dev20200903'
python version : 3.6.9
the code:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '3'
from tensorflow.keras.applications import VGG16
import tensorflow_model_optimization as tfmot
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
quantize_model = tfmot.quantization.keras.quantize_model
inputs = keras.Input(shape=(784,))
# img_inputs = keras.Input(shape=(32, 32, 3))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10)(x)
outputs = tf.reshape(outputs, [-1, 2, 5])
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
# keras.utils.plot_model(model, "my_first_model.png")
q_aware_model = quantize_model(model)
and the output:
Traceback (most recent call last):
File "<ipython-input-39-af601b78c010>", line 14, in <module>
q_aware_model = quantize_model(model)
File "/home/essys/.local/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/quantization/keras/quantize.py", line 137, in quantize_model
annotated_model = quantize_annotate_model(to_quantize)
File "/home/essys/.local/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/quantization/keras/quantize.py", line 210, in quantize_annotate_model
to_annotate, input_tensors=None, clone_function=_add_quant_wrapper)
...
File "/home/essys/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:
TypeError: tf__call() got an unexpected keyword argument 'shape'
If somebody know, please help ?
The reason behind is because your layer is not yet support for QAT at the moment. If you want to quantize it, you have to self writing your quantization by quantize_annotate_layer and pass it through quantize_scope and apply to your model by quantize_apply as describe in here: https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide?hl=en#quantize_custom_keras_layer
I have create a batch_norm_layer in here as an example
Tensorflow 2.x is not complete for QAT layer, pls consider using tf1.x by adding FakeQuant after operators.

Tensorflow object_detection: unable to find input and output tensors

I've successfully trained and saved a faster RCNN model for tensorflow using their object detection API. I'm now trying to run some inferences on the code, taking bits of code from this tutorial.
However, after I successfully restore the metagraph and the checkpoint, the system can't find the input and output nodes, I get the following error:
KeyError: "The name 'image_tensor:0' refers to a Tensor which does not
exist. The operation, 'image_tensor', does not exist in the graph."
The checkpoint and metagraph were created by the train.py script, on my own data, following the instructions given here.
This is my code:
OUTPUT_DIR = "my_path/models/SSD_v1/train"
CKPT_DIR = OUTPUT_DIR
LATEST_CKPT_FILENAME = "checkpoint"
LAST_CKPT_FILE = os.path.join(CKPT_DIR, LATEST_CKPT_FILENAME)
MODEL_FILENAME_PATH = os.path.join(OUTPUT_DIR, "model.ckpt.meta")
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def test_model(images_list, path_to_ckpt=None,
meta_graph=None):
if path_to_ckpt is None:
path_to_ckpt = tf.train.latest_checkpoint(CKPT_DIR, LATEST_CKPT_FILENAME)
if meta_graph is None:
meta_graph = MODEL_FILENAME_PATH
print("test_model launched")
tf.reset_default_graph()
detection_graph = tf.Graph()
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Restore graph
saver = tf.train.import_meta_graph(meta_graph, clear_devices=True)
print('metagraph restored')
saver.restore(sess, path_to_ckpt)
print('graph restored')
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') # This is where the error happens
# Each box represents a part of the image where a particular object was detected.
detected_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detected_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detected_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = graph.get_tensor_by_name('num_detections:0')
print("Output tensors: ")
print(detected_boxes)
print(detected_scores)
print(detected_classes)
print('')
for i, image in enumerate(images_list):
detected_boxes, detected_scores, detected_classes, num_detect = sess.run([detected_boxes, detected_scores, detected_classes, num_detections],
feed_dict={image_tensor: image})
print(i, num_detect, detected_boxes, detected_scores, detected_classes)
def main():
directory_path = "../data/samples/"
image_files = [f for f in os.listdir(directory_path) if os.path.isfile(os.path.join(directory_path, f))]
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_list = [ np.expand_dims(load_image_into_numpy_array(Image.open(os.path.join(directory_path, f))), axis=0) for f in image_files]
test_model(images_list=image_list)
if __name__=="__main__":
main()
Full error stacktrace:
Traceback (most recent call last): File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 99, in <module>
main() File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 95, in main
test_model(images_list=image_list) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/pano_faster_rcnn/src/run_faster_rcnn_inference.py", line 48, in test_model
image_tensor = graph.get_tensor_by_name('image_tensor:0') File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2733, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2584, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/guillaumedelaboulaye/PR8210PANO/faster-rcnn/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2626, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name))) KeyError: "The name 'image_tensor:0' refers to a Tensor which does not exist. The operation, 'image_tensor', does not exist in the graph."
In the train graph, the input/output nodes are not given those names. What you will need to do is to "export" your trained model via the export_inference_graph.py tool. I believe it currently exports it to a frozen graph or a SavedModel, but in future releases, it will export to ordinary checkpoint as well.
If you want sample code for finding the node names of the graph, referring to the object_detection_tutorial.ipynb, after the "Load a (frozen) Tensorflow model into memory." block:
for node in od_graph_def.node:
print node.name
That should list all the node names that you can then enter in the subsequent blocks.

TFSlim - problems loading saved checkpoint for VGG16

(1) I'm trying to fine-tune a VGG-16 network using TFSlim by loading pretrained weights into all layers except thefc8 layer. I achieved this by using the TF-SLIm function as follows:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
vgg = nets.vgg
# Specify where the Model, trained on ImageNet, was saved.
model_path = 'path/to/vgg_16.ckpt'
# Specify where the new model will live:
log_dir = 'path/to/log/'
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = vgg.vgg_16(images)
variables_to_restore = slim.get_variables_to_restore(exclude=['fc8'])
restorer = tf.train.Saver(variables_to_restore)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
restorer.restore(sess,model_path)
print "model restored"
This works fine as long as I do not change the num_classes for the VGG16 model. What I would like to do is to change the num_classes from 1000 to 200. I was under the impression that if I did this modification by defining a new vgg16-modified class that replaces the fc8 to produce 200 outputs, (along with a variables_to_restore = slim.get_variables_to_restore(exclude=['fc8']) that everything will be fine and dandy. However, tensorflow complains of a dimensions mismatch:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,4096,200] rhs shape= [1,1,4096,1000]
So, how does one really go about doing this ? The documentation for TFSlim is really patchy and there are several versions scattered on Github - so not getting much help there.
You can try using slim's way of restoring — slim.assign_from_checkpoint.
There is related documentation in the slim sources:
https://github.com/tensorflow/tensorflow/blob/129665119ea60640f7ed921f36db9b5c23455224/tensorflow/contrib/slim/python/slim/learning.py
Corresponding part:
*************************************************
* Fine-Tuning Part of a model from a checkpoint *
*************************************************
Rather than initializing all of the weights of a given model, we sometimes
only want to restore some of the weights from a checkpoint. To do this, one
need only filter those variables to initialize as follows:
...
# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)
checkpoint_path = '/path/to/old_model_checkpoint'
# Specify the variables to restore via a list of inclusion or exclusion
# patterns:
variables_to_restore = slim.get_variables_to_restore(
include=["conv"], exclude=["fc8", "fc9])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["conv"])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
checkpoint_path, variables_to_restore)
# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)
# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)
Update
I tried the following:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = nets.vgg.vgg_16(images)
print [v.name for v in slim.get_variables_to_restore(exclude=['fc8']) ]
And got this output (shortened):
[u'vgg_16/conv1/conv1_1/weights:0',
u'vgg_16/conv1/conv1_1/biases:0',
…
u'vgg_16/fc6/weights:0',
u'vgg_16/fc6/biases:0',
u'vgg_16/fc7/weights:0',
u'vgg_16/fc7/biases:0',
u'vgg_16/fc8/weights:0',
u'vgg_16/fc8/biases:0']
So it looks like you should prefix scope with vgg_16:
print [v.name for v in slim.get_variables_to_restore(exclude=['vgg_16/fc8']) ]
gives (shortened):
[u'vgg_16/conv1/conv1_1/weights:0',
u'vgg_16/conv1/conv1_1/biases:0',
…
u'vgg_16/fc6/weights:0',
u'vgg_16/fc6/biases:0',
u'vgg_16/fc7/weights:0',
u'vgg_16/fc7/biases:0']
Update 2
Complete example that executes without errors (at my system).
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
s = tf.Session(config=tf.ConfigProto(gpu_options={'allow_growth':True}))
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = nets.vgg.vgg_16(images, 200)
variables_to_restore = slim.get_variables_to_restore(exclude=['vgg_16/fc8'])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', variables_to_restore)
s.run(init_assign_op, init_feed_dict)
In the example above vgg16.ckpt is a checkpoint saved by tf.train.Saver for 1000 classes VGG16 model.
Using this checkpoint with all variables of 200 classes model (including fc8) gives the following error:
init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', slim.get_variables_to_restore())
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
1 init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
----> 2 './vgg16.ckpt', slim.get_variables_to_restore())
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.pyc in assign_from_checkpoint(model_path, var_list)
527 assign_ops.append(var.assign(placeholder_value))
528
--> 529 feed_dict[placeholder_value] = var_value.reshape(var.get_shape())
530
531 assign_op = control_flow_ops.group(*assign_ops)
ValueError: total size of new array must be unchanged

Tensorflow inception_v2_resnet inference

With reference to this post:
Using pre-trained inception_resnet_v2 with Tensorflow
i am trying to use the inception_resnet_v2 model to get predictions of images also. Therefore i looked at the snippet and tried to get it running, but it says "input_tensor" is not defined. Is there anything missing in the code mentioned or can anyone get me some hint to get it running / how to define the input_tensor variable?
Here is the snippet again:
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'],logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
Thanks
The code snippet appears to lack any definition for input_tensor. Looking at the definition of the inception_resnet_v2() function, the fact that the tensor is used in a feed_dict, and the fact that the size of your image is 299 x 299, you could define input_tensor as follows:
input_tensor = tf.placeholder(tf.float32, [None, 299, 299, 3])