Near empty frozen graph after using freeze_graph from Tensorflow - tensorflow

I am currently trying to strip the training operations from my GraphDef so that I can run it on Android. However, to do so, I need to first freeze the graph using Tensorflow's freeze_graph.py script.
However, I get the error UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte when attempting to run the bash script:
#!/bin/bash
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=/Users/leslie/Downloads/trained_model.pb \
--input_checkpoint=/Users/leslie/Downloads/Y6_1478303913_Leslie \
--output_graph=/tmp/frozen_graph.pb --output_node_names=Y_GroundTruth
Could this be a problem in the way I created my graph and checkpoint? I created the input_graph via tf.train.write_graph(sess.graph_def, location, 'trained_model.pb', as_text=False) and the checkpoint is created via saver.save(sess, chkpointpath). Answers from StackOverflow say that the python script has non-ascii characters and that I should just simply strip them from the python script but I do not think that is such a great idea.
Full traceback:
Traceback (most recent call last):
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow /python/tools/freeze_graph.py", line 135, in <module>
tf.app.run()
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 132, in main
FLAGS.output_graph, FLAGS.clear_devices, FLAGS.initializer_nodes)
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 98, in freeze_graph
text_format.Merge(f.read().decode("utf-8"), input_graph_def)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte
I also generated my protobuf file with as_text = True and the error above did not show up. However, I only got the following output.
Converted 0 variables to const ops.
1 ops in the final graph.
Complete contents of "frozen_graph.pb"
6
Y_GroundTruth��Placeholder*�
�dtype��0�*�
�shape��:
Snippet of PB-file generation code:
#Start all code before training code
# Tensor placeholders and variables
...
# Network weights and biases
...
# Network layer definitions
...
# Definition of cost function
...
# Create optimizer
...
# Session operations
...
#END all code before training code
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, model_save_path)
sess.run(tf.initialize_all_variables())
tf.train.write_graph(sess.graph_def, outputlocation, 'trained_model.pb', as_text=False)

Related

Can't save YOLOv4 model because of array shape mismatch

I am able to run transfer learning on YOLOv4 and my custom dataset with the following command (which runs successfully and can identify test images I present to the model):
!./darknet detector train /content/darknet/build/darknet/x64/data/obj.data /content/darknet/build/darknet/x64/cfg/yolov4_train.cfg /content/darknet/build/darknet/x64/yolov4.conv.137 -dont_show
I am using the save_model.py tool from this github site:
!git clone https://github.com/hunglc007/tensorflow-yolov4-tflite
When I enter the following command to save the model it fails:
!python3 save_model.py --weights /content/darknet/build/darknet/x64/backup/yolov4_train_final.weights --output ./checkpoints/yolov4-224 --input_size 224
The failure is a mismatch between the weights saved in training and the expected array shape in the core/utility module utils.py (line 63):
Traceback (most recent call last):
File "save_model.py", line 58, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "save_model.py", line 54, in main
save_tf()
File "save_model.py", line 49, in save_tf
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
File "/content/tensorflow-yolov4-tflite/core/utils.py", line 65, in load_weights
conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 4554552 into shape (1024,512,3,3)
I added a debug print, and it looks like the it's getting all the way to the last layer before choking. In other words, the previous layers all get through this line of code in utils.py with a match between the saved weights and the array shape. I think this is somehow related to the fact I'm using image sizes of 224,224,3 instead of 416,416,3, but I did specify that in the input_size. For completeness, here's the last couple of debug prints before the Traceback above:
layer (out_dim, in_dim, height, width) 107 512 1024 1 1
layer (out_dim, in_dim, height, width) 108 1024 512 3 3
If anyone has any ideas, that would be great!

How can I run mobiledet model successfully with the pretrained model in TF1 model zoo from TensorFlow object detection api?

I want to test the mobiledet model provided in the TF1 model zoo from TensorFlow object detection api. tf1 object detection model zoo
since the pretrained files contain both the pb file and the ckpt files the Screenshot of ckpt files.
So, I have tried two methods to load the pretrained model to do inference.
Firstly, I tried to load the tflite_graph.pb directly.I encountered the following problem, I tried to change the tf version, but it still did not solve.
The code is like this:
MODEL_DIR = '/tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
MODEL_CHECK_FILE = os.path.join(MODEL_DIR, 'tflite_graph.pb')
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(MODEL_CHECK_FILE,'rb') as f:
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/inference_demo.py", line 41, in <module>
tf.import_graph_def(graph_def, name='')
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 505, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node FeatureExtractor/MobileDetCPU/Conv/BatchNorm/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
Then, I tried to load the ckpt files to run the model.
mobiledet = 'tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
meta_path = mobiledet+'model.ckpt-400000.meta'
ckpt_path = mobiledet+'model.ckpt-400000'
with tf.Session() as sess:
saver=tf.train.import_meta_graph(meta_path)
saver.restore(sess, ckpt_path)
graph = tf.get_default_graph()
The error like this:
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/tf_load.py", line 15, in <module>
saver=tf.train.import_meta_graph(meta_path)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'LegacyParallelInterleaveDatasetV2' in binary running on localhost.localdomain. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
It seems that the loading errors of the above two methds are caused by the inconsistency of the tf version, but I have tried many tf versions and failed to solve it. Has anyone successfully run the mobiledet model in TF1 object detection model zoo?
OS: linux
TF version: tf 1.15
#Shane Zhao - are you planning on training with custom dataset or are you using the pretrained graph as is? The version of Tensorflow should only matter during training to the best of my knowledge. Anyways please refer this demo from Google in Colab - https://colab.research.google.com/github/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_Object_Detection_Demo_Training.ipynb#scrollTo=JDddx2rPfex9

TF object detection API - Compute evaluation measures failed

I successfully trained a model on my own dataset, exported the inference graph and did the inference on my test dataset.
I now have
the detections as tfrecord file, specified in input config
an eval_config file with the specified metrics set
When I try to compute the measures like in the new object detector inference and evaluation measure computation tutorial with
python object_detection/metrics/offline_eval_map_corloc.py --eval_dir=/media/sf_shared --eval_config_path=/media/sf_shared/eval_config.pbtxt --input_config_path=/media/sf_shared/input_config.pbtxt
It returns this AttributeError:
INFO:tensorflow:Processing file: /media/sf_shared/detections.record
INFO:tensorflow:Processed 0 images...
Traceback (most recent call last):
File "object_detection/metrics/offline_eval_map_corloc.py", line 173, in <module>
tf.app.run(main)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/metrics/offline_eval_map_corloc.py", line 166, in main
metrics = read_data_and_evaluate(input_config, eval_config)
File "object_detection/metrics/offline_eval_map_corloc.py", line 124, in read_data_and_evaluate
decoded_dict)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/models/research/object_detection/utils/object_detection_evaluation.py", line 174, in add_single_ground_truth_image_info
(groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
AttributeError: 'NoneType' object has no attribute 'size'
Any hints?
I fixed it (temporarily) as follows:
if (standard_fields.InputDataFields.groundtruth_difficult in groundtruth_dict.keys()) and groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]:
if groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult].size or not groundtruth_classes.size:
groundtruth_difficult = groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
In place of the existing lines (195-198) in
object_detection/metrutils/object_detection_evaluation.py
The error is caused due to the fact that, even in the case there is no difficulty flag passed, the size of the object is being checked for.
This is an error if you skipped that parameter in your tf records.
Perhaps this was the intent of the developers, but the clarity of documentation certainly leaves a lot to be desired for.

tensorflow: ValueError: GraphDef cannot be larger than 2GB

This is the error i got
Traceback (most recent call last):
File "fully_connected_feed.py", line 387, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "fully_connected_feed.py", line 289, in main
run_training()
File "fully_connected_feed.py", line 256, in run_training
saver.save(sess, checkpoint_file, global_step=step)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1386, in save
self.export_meta_graph(meta_graph_filename)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1414, in export_meta_graph
graph_def=ops.get_default_graph().as_graph_def(add_shapes=True),
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2257, in as_graph_def
result, _ = self._as_graph_def(from_version, add_shapes)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2220, in _as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
I believe it is from the result of this code
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden1")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden2")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
I am not sure why my model is getting so big in size (15k steps and 240MB). Any thoughts? thanks!
It's hard to say what is happening without seeing the code, but in general TensorFlow model sizes will not increase with number of steps - they should be fixed.
If the model size is increasing with number of steps, it suggests that the computation graph is being added to on every step. For example, something like:
import tensorflow as tf
with tf.Session() as sess:
for i in xrange(1000):
sess.run(tf.add(1, 2))
# or perhaps sess.run(tf.scatter_nd_update(...)) in your case
will create 3000 nodes in the graph (one for add, one for '1' one for '2' on every iteration). Instead, you want to define your computational graph once and run repeatedly with something like:
import tensorflow as tf
x = tf.add(1, 2)
# or perhaps x = tf.scatter_nd_update(...) in your case
with tf.Session() as sess:
for i in xrange(1000):
sess.run(x)
Which will have a fixed graph of 3 nodes for all the 1000 (and any more) iterations. Hope that helps.

error while merging summaries for tensorboard

I am trying to generate the graph for MNIST beginner tutorial but is getting the following error. For some reason, merged_summary_op object is None.
Traceback (most recent call last):
File "mnist1.py", line 48, in <module>
summary_str = sess.run(merged_summary_op)
File "/home/vagrant/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 307, in run
% (subfetch, fetch, type(subfetch), e.message))
TypeError: Fetch argument None of None has invalid type <type 'NoneType'>, must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
I think I am missing a step here. I launched the session first and then running the statement:
merged_summary_op = tf.merge_all_summaries()
I had the same error.
In my case, adding at least one tf.scalar_summary() before calling tf.merge_all_summaries() solved the problem.
For example,
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
tf.scalar_summary("cross_entropy", cross_entropy)
merged_summary_op = tf.merge_all_summaries()
I hope this snippet helps you.