TF object detection API - Compute evaluation measures failed - tensorflow

I successfully trained a model on my own dataset, exported the inference graph and did the inference on my test dataset.
I now have
the detections as tfrecord file, specified in input config
an eval_config file with the specified metrics set
When I try to compute the measures like in the new object detector inference and evaluation measure computation tutorial with
python object_detection/metrics/offline_eval_map_corloc.py --eval_dir=/media/sf_shared --eval_config_path=/media/sf_shared/eval_config.pbtxt --input_config_path=/media/sf_shared/input_config.pbtxt
It returns this AttributeError:
INFO:tensorflow:Processing file: /media/sf_shared/detections.record
INFO:tensorflow:Processed 0 images...
Traceback (most recent call last):
File "object_detection/metrics/offline_eval_map_corloc.py", line 173, in <module>
tf.app.run(main)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/metrics/offline_eval_map_corloc.py", line 166, in main
metrics = read_data_and_evaluate(input_config, eval_config)
File "object_detection/metrics/offline_eval_map_corloc.py", line 124, in read_data_and_evaluate
decoded_dict)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/models/research/object_detection/utils/object_detection_evaluation.py", line 174, in add_single_ground_truth_image_info
(groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
AttributeError: 'NoneType' object has no attribute 'size'
Any hints?

I fixed it (temporarily) as follows:
if (standard_fields.InputDataFields.groundtruth_difficult in groundtruth_dict.keys()) and groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]:
if groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult].size or not groundtruth_classes.size:
groundtruth_difficult = groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
In place of the existing lines (195-198) in
object_detection/metrutils/object_detection_evaluation.py
The error is caused due to the fact that, even in the case there is no difficulty flag passed, the size of the object is being checked for.
This is an error if you skipped that parameter in your tf records.
Perhaps this was the intent of the developers, but the clarity of documentation certainly leaves a lot to be desired for.

Related

Can't save YOLOv4 model because of array shape mismatch

I am able to run transfer learning on YOLOv4 and my custom dataset with the following command (which runs successfully and can identify test images I present to the model):
!./darknet detector train /content/darknet/build/darknet/x64/data/obj.data /content/darknet/build/darknet/x64/cfg/yolov4_train.cfg /content/darknet/build/darknet/x64/yolov4.conv.137 -dont_show
I am using the save_model.py tool from this github site:
!git clone https://github.com/hunglc007/tensorflow-yolov4-tflite
When I enter the following command to save the model it fails:
!python3 save_model.py --weights /content/darknet/build/darknet/x64/backup/yolov4_train_final.weights --output ./checkpoints/yolov4-224 --input_size 224
The failure is a mismatch between the weights saved in training and the expected array shape in the core/utility module utils.py (line 63):
Traceback (most recent call last):
File "save_model.py", line 58, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "save_model.py", line 54, in main
save_tf()
File "save_model.py", line 49, in save_tf
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
File "/content/tensorflow-yolov4-tflite/core/utils.py", line 65, in load_weights
conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 4554552 into shape (1024,512,3,3)
I added a debug print, and it looks like the it's getting all the way to the last layer before choking. In other words, the previous layers all get through this line of code in utils.py with a match between the saved weights and the array shape. I think this is somehow related to the fact I'm using image sizes of 224,224,3 instead of 416,416,3, but I did specify that in the input_size. For completeness, here's the last couple of debug prints before the Traceback above:
layer (out_dim, in_dim, height, width) 107 512 1024 1 1
layer (out_dim, in_dim, height, width) 108 1024 512 3 3
If anyone has any ideas, that would be great!

Trying to restore model, but tf.train.import_meta_graph(meta_path) raises error

I downloaded pretrained mobilenetV2 models from tensorflow models,and try to restore the graph,but got unexpected error.
Codes to reproduce the error is pretty concise:
import tensorflow as tf
meta_path = 'path/to/mobilenet_v2_0.35_224/mobilenet_v2_0.35_224.ckpt.meta'
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
saver = tf.train.import_meta_graph(meta_path)
then the last line raises error:
Traceback (most recent call last):
File "/home/CVAR/study/codes/languages/python/pycharm/learn_tensorflow/train_mobileNet_v2/test_of_functions/saver_test.py", line 21, in <module>
saver = tf.train.import_meta_graph(meta_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 391, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 158, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'InfeedEnqueueTuple'
My system information is :
ubuntu 16.04
python 3.5
tensorflow-gpu 1.9
Any idea?
I recently also met such a problem. It seems like the reason is that the TensorFlow version you use to train the model is different from the version you use to read the graph description proto. What you need to do is to reinstall the TensorFlow to your training version. Otherwise, retraining the model would work.
FYI, the TensorFlow version I used to train is 1.12.0, by contrast, the version I use to load the graph is 1.13.1. Reinstallation solves the problem.
There are some ops not defined. from conv_blocks import * will fix this bug but I got another problem "ValueError: NodeDef expected inputs 'float, int32' do not match 1 inputs specified;". Still debugging, but hope this tip solves your problem.

'MemoryError' when padding sequences using tensorflow

I am trying to training my model on an AWS instance 'g2.2xlarge' but getting a 'MemoryError' when trying to add paddings to my sequences.
content_array = keras.preprocessing.sequence.pad_sequences(content_array, maxlen=max_sequence_length,
padding='post')
Getting this error:
Traceback (most recent call last):
File "trainer.py", line 185, in <module>
train()
File "trainer.py", line 52, in train
padding='post')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/preprocessing/sequence.py", line 94, in pad_sequences
x = (np.ones((num_samples, maxlen) + sample_shape) * value).astype(dtype)
MemoryError
Any idea why ? I haven't started training the model even.
I was calculating the maximum sequence length incorrectly which led to a huge number. After correcting it I am not having any issues.

Near empty frozen graph after using freeze_graph from Tensorflow

I am currently trying to strip the training operations from my GraphDef so that I can run it on Android. However, to do so, I need to first freeze the graph using Tensorflow's freeze_graph.py script.
However, I get the error UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte when attempting to run the bash script:
#!/bin/bash
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=/Users/leslie/Downloads/trained_model.pb \
--input_checkpoint=/Users/leslie/Downloads/Y6_1478303913_Leslie \
--output_graph=/tmp/frozen_graph.pb --output_node_names=Y_GroundTruth
Could this be a problem in the way I created my graph and checkpoint? I created the input_graph via tf.train.write_graph(sess.graph_def, location, 'trained_model.pb', as_text=False) and the checkpoint is created via saver.save(sess, chkpointpath). Answers from StackOverflow say that the python script has non-ascii characters and that I should just simply strip them from the python script but I do not think that is such a great idea.
Full traceback:
Traceback (most recent call last):
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow /python/tools/freeze_graph.py", line 135, in <module>
tf.app.run()
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 132, in main
FLAGS.output_graph, FLAGS.clear_devices, FLAGS.initializer_nodes)
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 98, in freeze_graph
text_format.Merge(f.read().decode("utf-8"), input_graph_def)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte
I also generated my protobuf file with as_text = True and the error above did not show up. However, I only got the following output.
Converted 0 variables to const ops.
1 ops in the final graph.
Complete contents of "frozen_graph.pb"
6
Y_GroundTruth��Placeholder*�
�dtype��0�*�
�shape��:
Snippet of PB-file generation code:
#Start all code before training code
# Tensor placeholders and variables
...
# Network weights and biases
...
# Network layer definitions
...
# Definition of cost function
...
# Create optimizer
...
# Session operations
...
#END all code before training code
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, model_save_path)
sess.run(tf.initialize_all_variables())
tf.train.write_graph(sess.graph_def, outputlocation, 'trained_model.pb', as_text=False)

the mini-batch of deep and wide in tensorflow

I am trying to get the deep&wide model working on a big data, for example(enter link description here
).
Where the hidden units of deep side is [1024,512,256].
We use tf.SparseTensor() to store our data.
I get below error When I use 40 million instance as training data.
***
m.fit(input_fn=lambda: input_fn(df_train), steps=FLAGS.train_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 182, in fit
monitors=monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 458, in _train_model
summary_writer=graph_actions.get_summary_writer(self._model_dir))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 76, in get_summary_writer
graph=ops.get_default_graph())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/summary_io.py", line 113, in __init__
self.add_graph(graph=graph, graph_def=graph_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/summary_io.py", line 204, in add_graph
true_graph_def = graph.as_graph_def(add_shapes=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2117, in as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
So I want to use mini-batch as a solution to this problem, but it is not working. How do I use mini-batch to handle big data?
To train with minibatches, you just call model.train multiple times per epoch, feeding it a subset of the data each time. You can feed the data without loading it into the graphdef by either using feed_dict or by using one of the data reading ops described in the reading data tutorial.