I am doing a research project that consists in an object detection AI, capable of detecting by a webcam 7 classes of objects.
Using google colab, I successfully trained the ssd_mobilenet_v2_quantized_300x300_coco model, using tensorflow 1.15.
The objective is to run the model in a Raspberry Pi 3b+, using the official camera and Google Coral EdgeTPU device, so the model must be quantized in order to use it.
The issue comes with the testing part; so after training the model, converted it to tflite using:
!python export_tflite_ssd_graph.py --pipeline_config_path="/content/drive/My Drive/Colab Data/models/research/object_detection/training/ssd_mobilenet_v2_quantized_300x300_coco.config" --trained_checkpoint_prefix=training/model.ckpt-28523 --output_directory=compiler/ --add_postprocessing_op=true
and
!tflite_convert --graph_def_file=compiler/tflite_graph.pb --output_file=compiler/detect.tflite --output_format=TFLITE --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_custom_ops
Converted model: https://gofile.io/d/kOe2Ac
Tried to test the model using Edje Electronics webcam script. found here
And outputs this error:
RuntimeError: tensorflow/lite/kernels/detection_postprocess.cc:404 ValidateBoxes(decoded_boxes, num_boxes) was not true.Node number 98 (TFLite_Detection_PostProcess) failed to invoke.
The weirdest thing is that if I try to run the same script in my current workstation (with tensorflow 1.15.1), the code runs flawlessly, so there should be something wrong with the rpi.
The rpi is running tensorflow 1.15.2, built from the WHL source. Actually y tried with all the versions that I can, but always the same error.
I will be so grateful with any help that could bring to me. Thanks in advance.
Ok, thanks for the info! But how is that it works flawlessly on other pc and not on the rpi? Do i need to install tf-nightly instead of tensorflow? This might be of the cpu architecture?
ValidateBoxes verifies if the the xmin, xmax, ymin, ymax of the detected box makes sense or not. And the issue is usually caused by detected boxes with the same xmin/xmax or ymin/ymax, which should not be flagged as an issue, and the rest of the code and tolerant with the case very well. The issue has been fixed in TFLite and it is now available through the tf-nightly build.
If you'd like to build TF from source, here is a workaround solution.
Related
I am trying to run deep-sort for real-time object tracking on yolov4-tiny model on webcam from this github repository.
https://github.com/theAIGuysCode/yolov4-deepsort
But there is only command for yolov4 for real time object detection using webcam. How can I modify the code and what will be the command to run on yolov4-tiny to run on webcam for real time object detection.
I will be glad if you suggest me some other way to run object tracking(less resource hungry way) or in tensorflow lite.
The answer is on the setup page https://github.com/theAIGuysCode/yolov4-deepsort. You don't need to make changes you just need to get the right weights and then run it pointing to the correct weights like:
{#Run yolov4-tiny object tracker
python object_tracker.py --weights ./checkpoints/yolov4-tiny-416 --model yolov4 --video ./data/video/test.mp4 --output ./outputs/tiny.avi --tiny}
You can look at the yolov4.py code and you will seen how it checks for normal or tiny when you give it the command line string see https://github.com/theAIGuysCode/yolov4-deepsort/blob/master/core/yolov4.py.
Overview
I know this subject has been discussed many times, but I am having a hard time understanding the workflow, or rather, the variations of the workflow.
For example, imagine you are installing TensorFlow on Windows 10. The main goal being to train a custom model, convert to TensorFlow Lite, and copy the converted .tflite file to a Raspberry Pi running TensorFlow Lite.
The confusion for me starts with the conversion process. After following along with multiple guides, it seems TensorFlow is often install with pip, or Anaconda. But then I see detailed tutorials which indicate it needs to be built from source in order to convert from TensorFlow models to TFLite models.
To make things more interesting, I've also seen models which are converted via Python scripts as seen here.
Question
So far I have seen 3 ways to do this conversion, and it could just be that I don't have a grasp on the full picture. Below are the abbreviated methods I have seen:
Build from source, and use the TensorFlow Lite Optimizing Converter (TOCO):
bazel run --config=opt tensorflow/lite/toco:toco -- --input_file=$OUTPUT_DIR/tflite_graph.pb --output_file=$OUTPUT_DIR/detect.tflite ...
Use the TensorFlow Lite Converter Python API:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
Use the tflite_convert CLI utilities:
tflite_convert --saved_model_dir=/tmp/mobilenet_saved_model --output_file=/tmp/mobilenet.tflite
I *think I understand that options 2/3 are the same, in the sense that the tflite_convert utility is installed, and can be invoked either from the command line, or through a Python script. But is there a specific reason you should choose one over the other?
And lastly, what really gets me confused is option 1. And maybe it's a version thing (1.x vs 2.x)? But what's the difference between the TensorFlow Lite Optimizing Converter (TOCO) and the TensorFlow Lite Converter. It appears that in order to use TOCO you would have to build TensorFlow from source, so is there is a reason you would use one over the other?
There is no difference in the output from different conversion methods, as long as the parameters remain the same. The Python API is better if you want to generate TFLite models in an automated way (for eg a Python script that's run periodically).
The TensorFlow Lite Optimizing Converter (TOCO) was the first version of the TF->TFLite converter. It was recently deprecated and replaced with a new converter that can handle more ops/models. So I wouldn't recommend using toco:toco via bazel, but rather use tflite_convert as mentioned here.
You should never have to build the converter from source, unless you are making some changes to it and want to test them out.
I'm trying to implement a trained-from-scratch model onto a USB webcam attached to a google dev board. I cut the training at 15k. The model had not converged fully at 15k but I wanted to workout how to freeze the graph, tflite and move it onto a google coral dev board. I realise this may be an issue in terms of accuracy. MobileNetv2.
Below are that of a ceiling and a desk. The detection boxes appear in the exact same place each and everytime, no matter what the camera is looking at. The boxes do not move, nor does the percent change when the camera moves.
Train model
python train.py --logtostderr --train_dir=training --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
Freeze model
python export_tflite_ssd_graph.py --pipeline_config_path=training/ssd_mobilenet_v2_coco.config --trained_checkpoint_prefix=training/model.ckpt-9070 --output_directory=inference_graph --add_postprocessing_op=true
Quantize
tflite_convert --graph_def_file=inference_graph/tflite_graph.pb --output_file=inference_graph/detect.tflite --inference_type=QUANTIZED_UINT8 --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --mean_values=128 --std_dev_values=127 --allow_custom_ops --default_ranges_min=0 --default_ranges_max=6
The model is ran as follows on the google coral dev board;
export DISPLAY=:0 && edgetpu_detect \
--source /dev/video1:YUY2:1280x720:20/1 \
--model ${DEMO_FILES}/converted_tflite_quant_model.tflite
The google coral came prepacked with a demo facial recognition model (below). The only difference is the model. The below command works perfectly and tracks my face across the screen. The above command results in the photos below.
export DISPLAY=:0 && edgetpu_detect \
--source /dev/video1:YUY2:1280x720:20/1 \
--model ${DEMO_FILES}/mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite
What could possibly be going wrong here?
Edit, a few months later, I've retrained it with mobilenet_v1, to the exact same result.
Humm,I wonder if the source camera could be a problem also?
Can you double check if the pixel format, size, and FPS values are correct?
Also, on both images that you posted, the process seems to be loading your converted_tflite_quant_model.tflite model, are you sure you tried other model?
Were you able to run the demo with the correct result?
I have trained a model for detection, which is doing great when embedded in tensorflow sample app.
After freezing with export_tflite_ssd_graph and conversion to tflite using toco the results do perform rather bad and have a huge "variety".
Reading this answer on a similar problem with loss of accuracy I wanted to try tflite_diff_example_test on a tensorflow docker machine.
As the documentation is not that evolved right now, I build the tool referencing this SO Post
using:
bazel build tensorflow/contrib/lite/testing/tflite_diff_example_test.cc which ran smooth.
After figuring out all my needed input parameters I tried the testscript with following commands:
~/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh tensorflow/contrib/lite/testing/tflite_diff_example_test '--tensorflow_model=/tensorflow/shared/exported/tflite_graph.pb' '--tflite_model=/tensorflow/shared/exported/detect.tflite' '--input_layer=a,b,c,d' '--input_layer_type=float,float,float,float' '--input_layer_shape=1,3,4,3:1,3,4,3:1,3,4,3:1,3,4,3' '--output_layer=x,y'
and
bazel-bin/tensorflow/contrib/lite/testing/tflite_diff_example_test --tensorflow_model="/tensorflow/shared/exported/tflite_graph.pb" --tflite_model="/tensorflow/shared/exported/detect.tflite" --input_layer=a,b,c,d --input_layer_type=float,float,float,float --input_layer_shape=1,3,4,3:1,3,4,3:1,3,4,3:1,3,4,3 --output_layer=x,y
Both ways are failing. Errors:
way:
tflite_diff_example_test.cc:line 1: /bazel: Is a directory
tflite_diff_example_test.cc: line 3: syntax error near unexpected token '('
tflite_diff_example_test.cc: line 3: 'Licensed under the Apache License, Version 2.0 (the "License");'
/root/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh: line 184: /tensorflow/: Is a directory
/root/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh: line 276: /tensorflow/: Is a directory
way:
2018-09-10 09:34:27.650473: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Failed to create session. Op type not registered 'TFLite_Detection_PostProcess' in binary running on d36de5b65187. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.)tf.contrib.resamplershould be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
I would really appreciate any help, that enables me to compare the output of two graphs using tensorflows given tests.
The second way you mentioned is the correct way to use tflite_diff. However, the object detection model containing the TFLite_Detection_PostProcess op cannot be run via tflite_diff.
tflite_diff runs the provided TensorFlow (.pb) model in the TensorFlow runtime and runs the provided TensorFlow Lite (.tflite) model in the TensorFlow Lite runtime. In order to run the .pb model in the TensorFlow runtime, all of the operations must be implemented in TensorFlow.
However, in the model you provided, the TFLite_Detection_PostProcess op is not implemented in TensorFlow runtime - it is only available in the TensorFlow Lite runtime. Therefore, TensorFlow cannot resolve the op. Therefore, you unfortunately cannot use the tflite_diff tool with this model.
I am running sample program which comes packaged with Tensorflow object detection API(object_detection_tutorial.ipynb).
Program runs fine with no error, but bounding boxes are not diaplayed at all.
My environment is as follows:
Windows 10
Python 3.6.3
What can be the reason?
With regards
Manish
It seems that the latest version of the model ssd_mobilenet_v1_coco_2017_11_08 doesn't work and outputs abnormally low value. Replacing it in the Jupyter Notebook with an older version of the model works fine for me:
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
Ref: https://github.com/tensorflow/models/issues/2773
Please try updated SSD models in the detection zoo : https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md. This should be fixed.