Deeplab xception for mobile (tensorflow lite) - tensorflow

I am checking the option to run image segmentation using the pre-trained deeplab xception65_coco_voc_trainval model.
The frozen model size is ~161MB, after I convert it to tflite the size is ~160MB, and running this model on my PC cpu takes ~25 seconds.
Is that "expected" or there is something I can do better?
The conversion to tflite is as follow:
tflite_convert \
--graph_def_file="deeplabv3_pascal_trainval/frozen_inference_graph.pb" \
--output_file="deeplab_xception_pascal.tflite" \
--output_format=TFLITE \
--input_shape=1,513,513,3 \
--input_arrays="sub_7" \
--output_arrays="ArgMax" \
--inference_type=FLOAT \
--allow_custom_ops
Thanks!

According to https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md, xception65_coco_voc_trainval with 3 eval scales takes about 223 seconds. The frozen graph has a single eval scale, so ~25 seconds sounds about right to me.
To speed up inference for TfLite I would suggest using gpu delegate, but as you are running on a PC, you will need to find a smaller model. Maybe try one of the mobilenet based models? The edgetpu models will run in tflite without an edgetpu and should be quite fast, although these are trained on cityscapes.

Related

TensorFlow lite: High loss in accuracy after converting model to tflite

I have been trying TFLite to increase detection speed on Android but strangely my .tflite model now almost only detects 1 category.
I have done testing on the .pb model that I got after retraining a mobilenet and the results are good but for some reason, when I convert it to .tflite the detection is way off...
For the retraining I used the retrain.py file from Tensorflow for poets 2
I am using the following commands to retrain, optimize for inference and convert the model to tflite:
python retrain.py \
--image_dir ~/tf_files/tw/ \
--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/1 \
--output_graph ~/new_training_dir/retrainedGraph.pb \
-–saved_model_dir ~/new_training_dir/model/ \
--how_many_training_steps 500
sudo toco \
--input_file=retrainedGraph.pb \
--output_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TENSORFLOW_GRAPHDEF \
--input_shape=1,224,224,3 \
--input_array=Placeholder \
--output_array=final_result \
sudo toco \
--input_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=retrainedGraph.tflite \
--inference_type=FLOAT \
--inference_input_type=FLOAT \
--input_arrays=Placeholder \
--output_array=final_result \
--input_shapes=1,224,224,3
Am I doing anything wrong here? Where could the loss in accuracy come from?
I faced the same issue while I was trying to convert a .pb model into .lite.
In fact, my accuracy would come down from 95 to 30!
Turns out the mistake I was committing was not during the conversion of .pb to .lite or in the command involved to do so. But it was actually while loading the image and pre-processing it before it is passed into the lite model and inferred using
interpreter.invoke()
command.
The below code you see is what I meant by pre-processing:
test_image=cv2.imread(file_name)
test_image=cv2.resize(test_image,(299,299),cv2.INTER_AREA)
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
interpreter.set_tensor(input_tensor_index, test_image)
interpreter.invoke()
digit = np.argmax(output()[0])
#print(digit)
prediction=result[digit]
As you can see there are two crucial commands/pre-processing done on the image once it is read using "imread()":
i) The image should be resized to the size that is the "input_height" and "input_width" values of the input image/tensor that was used during the training. In my case (inception-v3) this was 299 for both "input_height" and "input_width". (Read the documentation of the model for this value or look for this variable in the file that you used to train or retrain the model)
ii) The next command in the above code is:
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
I got this from the "formulae"/model code:
test_image = np.expand_dims((test_image-input_mean)/input_std, axis=0).astype(np.float32)
Reading the documentation revealed that for my architecture input_mean = 0 and input_std = 255.
When I did the said changes to my code, I got the accuracy that was expected (90%).
Hope this helps.
Please file an issue on GitHub https://github.com/tensorflow/tensorflow/issues and add the link here.
Also please add more details on what you are retraining the last layer for.

Good training total loss but inference give bad performance

I exported a graph from tensorflow using this command:
export_inference_graph.py --input_type image_tensor \
--pipeline_config_path=/mnt/data/pipeline.config \
--trained_checkpoint_prefix=/mnt/data/checkpoints/model.ckpt-1670059 \
--output_directory=/mnt/data/output/2018-04-23-1
My network converges :
However, when using the network to do inference, it is really fast but detects the wrong things or nothing at all.
I am using the inception model, any help would be amazing.
Thanks

How to get rid of additional ops added in the graph while fine-tuning Tensorflow Inception_V3 model?

I am trying to convert a fine-tuned tensorflow inception_v3 model to uff format which can be run on NVIDIA's Jetson TX2. For conversion to uff, certain ops are supported, some are not. I am able to successfully freeze and convert to uff inception_v3 model with imagenet checkpoint provided by tensorflow. However if I fine-tune the model, additional ops like Floor, RandomUniform, etc are added in the new graph which are not yet supported. These layers remain even after freezing the model. This is happening in the fine-tuning for flowers sample provided on tensorflow site as well.
I want to understand why additional ops are added in the graph, while fine-tuning is just supposed to modify the final layer to match number of outputs required.
If they are added while training, how can I get rid of them? What post-processing steps tensorflow team followed before releasing inception_v3 model for imagenet?
I can share the pbtxt files if needed. For now, model layers details are uploaded at https://github.com/shrutim90/TF_to_UFF_Issue. I am using Tensorflow 1.6 with GPU.
I am following the steps to freeze or fine-tune the model from: https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. As described in the above link, to reproduce the issue, install TF-Slim image models library and follow these steps:
1. python export_inference_graph.py \
--alsologtostderr \
--model_name=inception_v3 \
--output_file=/tmp/inception_v3_inf_graph.pb
2. python freeze_graph.py \
--input_graph=/tmp/inception_v3_inf_graph.pb \
--input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \
--input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
3. DATASET_DIR=/tmp/flowers
TRAIN_DIR=/tmp/flowers-models/inception_v3
CHECKPOINT_PATH=/tmp/my_checkpoints/inception_v3.ckpt
python train_image_classifier.py --train_dir=$TRAIN_DIR --dataset_dir=$DATASET_DIR --dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=${CHECKPOINT_PATH} --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
4. python freeze_graph.py \
--input_graph=/tmp/graph.pbtxt \
--input_checkpoint=/tmp/checkpoints/model.ckpt-2539 \
--input_binary=false --output_graph=/tmp/frozen_inception_v3_flowers.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
To check the layers, you can check out .pbtxt file or use NVIDIA's convert-to-uff utility.
Run training script -> export_inference_graph -> freeze_graph . This gets rid of all the extra nodes and model can be easily converted to uff.

Can I convert the tensorflow inception pb model to tflite model?

I see the guide of converting tensorflow pb model, only given to mobilenet model
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite#step-2-model-format-conversion
So my question is, can I convert the tensorflow inception pb model to tflite model?
If yes, where can I get the checkpoint (ckpt) file? I can't find them for inception model in https://github.com/tensorflow/models/tree/master/research/slim/nets.
Did I miss anything?
Yes, you should also be able to convert an inception model to TFLITE. You only need the checkpoints if the graph is not yet frozen. If the graph is already frozen (what I assume), you should be able to convert it with the following command:
bazel run --config=opt //tensorflow/contrib/lite/toco:toco -- \
--input_file=**/path/to/your/graph.pb** \
--output_file=**/path/to/your/output.tflite** \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--inference_type=FLOAT \
--input_shape=1,299,299,3 \
--input_array=**your_input** \
--output_array=**your_final_tensor**
(you have to replace the text between the asterisks with the arguments that applies to your case; --inputs=Mul for example)
Note on --inputs=Mul
Some of the TF commands used in Inception v3 are not supported by TFLITE (decodejpeg, expand_dims), since they typically do not have to be adopted by the model on the mobile phone (these tasks are done directly in the app code). Therefore you have to define where you want to hook into the graph with TF Lite.
You will probably get the following error message without using input_array:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If you have a custom implementation for them you can disable this error with --allow_custom_ops. Here is a list of operators for which you will need custom implementations: DecodeJpeg, ExpandDims.
I hope I could help you. I'm just struggling with converting retrained graphs around.

How to train TensorFlow's deeplab model on Cityscapes?

Is it possible to train the current deeplab model in TensorFlow to reasonable accuracy using 4 GPUs with 11GB? I seem to be able to fit 2 batches per GPU, so am running a total batch size of 8 across 4 clones.
Following the instructions included with the model, I get a mean IoU of < 30% after 90,000 iterations.
PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim python deeplab/train.py \
--logtostderr --training_number_of_steps=90000 \
--train_split="train" --model_variant="xception_65" \
--atrous_rates=6 --atrous_rates=12 --atrous_rates=18 \
--output_stride=16 --decoder_output_stride=4 --train_crop_size=769 \
--train_crop_size=769 --train_batch_size=8 --num_clones=4 \
--dataset="cityscapes" \
--tf_initial_checkpoint=deeplab/models/xception/model.ckpt \
--train_logdir=$LOGDIR \
--dataset_dir=deeplab/datasets/cityscapes/tfrecord
I have tried with batch norm both enabled and disabled without much difference in outcome.
Thanks!
It seems I needed a much larger step length than the default. 1e-2 gives results closer to the published results, with batch size 15 and a smaller crop window size.
if you check this link https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md
It has links to pretrained models for MobileNet v2 and DeepLab trained on Cityscapes. You can modify the existing shell scripts present here to train on cityscapes.