Freeze mobilenet_v2 (freeze_graph.py) - tensorflow

I trained mobilenet_V2 using slim's train_image_classifier. My command was:
python3 ~/models/research/slim/train_image_classifier.py \
--model_name="mobilenet_v2" \
--learning_rate=0.045 * 2 \
--preprocessing_name="inception_v2" \
--label_smoothing=0.1 \
--moving_average_decay=0.9999 \
--batch_size=16 \
--num_clones=2 \
--learning_rate_decay_factor=0.98 \
--num_epochs_per_decay=2.5 / 2 \
--train_dir=[...]/tensorflow_logs/mobilenet_v2 \
--dataset_dir=[...]/imagenet_data \
--dataset_name='imagenet' \
--train_image_size=229
It created the following files in mobilenet_v2/ (not the full list):
checkpoint
graph.pbtxt
[...]
model.ckpt-697555.data-00000-of-00001
model.ckpt-697555.index
model.ckpt-697555.meta
I am able to do inference using my checkpoint.
I am now struggling to convert my checkpoint variables to Const ops with freeze graph. When I try:
python3 -m tensorflow.python.tools.freeze_graph \
--input_graph [...]/tensorflow_logs/mobilenet_v2/graph.pbtxt \
--input_checkpoint [...]/tensorflow_logs/mobilenet_v2/model.ckpt-697555 \
  --input_binary false \
  --output_graph /mnt/sda1/tensorflow_logs/mobilenet_v2/mobilenet_v2_frozen.pb \
  --output_node_names MobilenetV2/Predictions/Reshape_1
I get: AssertionError: MobilenetV2/Predictions/Reshape_1 is not in graph
However the output of: [print(n.name) for n in tf.get_default_graph().as_graph_def().node] contains:
MobilenetV2/Logits/Squeeze
MobilenetV2/Logits/output
MobilenetV2/Predictions/Reshape/shape
MobilenetV2/Predictions/Reshape
MobilenetV2/Predictions/Softmax
MobilenetV2/Predictions/Shape
MobilenetV2/Predictions/Reshape_1
Hence my confusion.
Last time I was able to freeze my graph I used the input_saved_model_dir option to load a saved model [I believe those are now the preferred method ?] instead of input_graph and input_checkpoint. Unfortunately the train_image_classifier.py script doesn't create one, only the checkpoints.
Any ideas/ comments appreciated!

Related

Reshaping tensorflow output tensors

I am training an object detection model with Azure customvision.ai. The model output is with tensorflow, either saved model .pb, .tf or .tflite.
The model output type is designated as float32[1,13,13,50]
I then push the .tflite onto a Google Coral Edge device and attempt to run it (previous .tflite models trained with Google Cloud worked, but I'm now bound to corporate Azure and need to use customvision.ai). These commands are with
$ mdt shell
$ export DEMO_FILES="/usr/lib/python3/dist*/edgetpu/demo"
$ export DISPLAY=:0 && edgetpu_detect \
$ --source /dev/video1:YUY2:1280x720:20/1 \
$ --model ${DEMO_FILES}/model.tflite
Finally, the model attempts to run, but results in a ValueError
'This model has a {}.'.format(output_tensors_sizes.size)))
ValueError: Detection model should have 4 output tensors! This model has 1.
What is happening here? How do I reshape my tensorflow model to match the device requirements of 4 output tensors?
The model that works
The model that does not work
Edit, this outputs a tflite model, but still has only one output
python tflite_convert.py \
--output_file=model.tflite \
--graph_def_file=saved_model.pb \
--saved_model_dir="C:\Users\b0588718\AppData\Roaming\Python\Python37\site-packages\tensorflow\lite\python" \
--inference_type=FLOAT \
--input_shapes=1,416,416,3 \
--input_arrays=Placeholder \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--mean_values=128 \
--std_dev_values=128 \
--allow_custom_ops \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true
You are running an object detection demo where the engine expects 4 outputs from the model and your model only have one outputs. Maybe you had the tflite conversion incorrect? For instance, if you grabbed the Face SSD model from our zoo, conversion should be like this:
$ tflite_convert \
--output_file=face_ssd.tflite \
--graph_def_file=tflite_graph.pb \
--inference_type=QUANTIZED_UINT8 \
--input_shapes=1,320,320,3 \
--input_arrays normalized_input_image_tensor \
--output_arrays "TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3" \
--mean_values 128 \
--std_dev_values 128 \
--allow_custom_ops \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true
Take a look at a similar query for more details:
https://github.com/google-coral/edgetpu/issues/135#issuecomment-640677917

Why the execution time of quantized Uint8 TF-lite model is slower than F32 version?

I am using TF1.12 to convert a simple mnist model with two conv2d layers to TF-lite.
F32:
`*tflite_convert --output_file model_lite/conv_net_f32.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE*`
UINT8:
tflite_convert --output_file model_lite/conv_net_uint8.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE \
--mean_values 0\
--std_dev_values 255\
--default_ranges_min 0 \
--default_ranges_max 255 \
--inference_type QUANTIZED_UINT8 \
--inference_input_type QUANTIZED_UINT8*
However I found the execution time of quantized uint8 version is slower than f32.
Results:
It does not make sense to me.
Does anyone know the reason?
Thanks for any inputs!
I think you should try conversion with following command:
tflite_convert --output_file model_lite/conv_net_uint8.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE \
--mean_values 128\
--std_dev_values 127
--default_ranges_min 0 \
--default_ranges_max 1 \
--inference_type QUANTIZED_UINT8 \
--inference_input_type QUANTIZED_UINT8*
default_ranges_min and default_ranges_max correspond to maximum and minimum values of your network, as in the range of your activation functions.
If you are using activation such as relu6, you should change default_ranges_max to 6.
See this Stack Overflow question for information about mean_values and std_dev_values. They depend on your training data.
Unless the hardware has special support for fast 8 bit instructions, quantized models are not expected to be any faster than FP32 models
Eg . tflite uint8 models runs with same speed as that of fp32 on raspi aarch64 as it is arm7 device

Training my own dataset on deeplab but inference all black

I try to train my own dataset on deeplab model in TensorFlow model garden, I could get a decreasing loss result through time, I using pre-train model provided by official repo.
But when I try to vis with latest checkpoint or try to freeze the model to .pb and do inference, outcome nothing but the black image( I check these images with NumPy all pixels are 0).
My training script like this:
python deeplab/train.py \
--logtostderr \
--num_clones=1 \
--training_number_of_steps=500000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--train_batch_size=2 \
--dataset={$OWN_DATASET} \
--train_logdir={$TRAIN_LOGDIR} \
--dataset_dir={$DATASET_DIR} \
--tf_initial_checkpoint={$INITIAL_CHECKPOINT}
did anyone happen before?
This is an old thread and I don't know if you still need help but you haven't provided much information regarding your dataset. Here are some general pointers:
Try setting these flags in train.py
--fine_tune_batch_norm=False \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
Make sure your SegmentationClassRaw folder label masks are 0, 1, 2, 3... where 0 is the background and 1, 2, 3, ... are individual classes. Run "asarray(image)" to see these pixels and make sure the label is correct.
If you have an unbalanced dataset, you can try setting the weights for different labels in through train.py.
--label_weights=1 \ # Weight for label 0 (Background)
--label_weights=10 \ #Weight for label 1 (Object class 1)
--label_weights=15 \ #Weight for label 2 (Object class 2)
If all fails, try a larger dataset. A dataset size of 225 images and 2000 steps (with an initial mobilenetv2 checkpoint) yielded results for me, although the accuracy/performance was not very good since the dataset size was small. Just as a reference point, the loss of this small dataset was around 0.05-0.06 after 2000 steps.

How to convert ssd_resnet_50 tensorflow checkpoint to .tflite?

I'm trying to convert the ssd_resnet_50 model from the tensorflow Object Detection API to .tflite format but it doesn't work.
Some background:
I'm able to successfully convert the out of the box and retrained ssd_mobilenet_v2_quantized model to .tflite and run the .tflite model.
Because the ssd_resnet_50 model is not quantized, I've added the following to the ssd_resnet_50 pipeline.config file and retrained the model:
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
After retraining ssd_resnet_50, I try to convert the model to .tflite format with the following commands:
# Produces tflite_graph.pb
python3 object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=pipeline.config \
--trained_checkpoint_prefix=model.ckpt-50000 \
--output_directory=$OUTPUT_DIR \
--add_postprocessing_op=true
# Produces detect.tflite
bazel run -c opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops
Normally, TOCO would produce a valid detect.tflite that could be run. However, TOCO runs into the following error regarding quantization and Relu6.
Can anyone help?
Error :
2019-05-21 10:41:07.885065: F tensorflow/lite/toco/tooling_util.cc:1718] Array WeightSharedConvolutionalBoxPredictor_2/BoxPredictionTower/conv2d_0/BatchNorm/feature_2/FusedBatchNorm_mul_0, which is an input to the Add operator producing the output array WeightSharedConvolutionalBoxPredictor_2/Relu6, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation.
run_toco.sh: line 25: 3280 Aborted (core dumped) bazel run -c opt tensorflow/lite/toco:toco -- --input_file=$OUTPUT_DIR/tflite_graph.pb --output_file=$OUTPUT_DIR/detect.tflite --input_shapes=1,640,640,3 --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_values=128 --change_concat_input_ranges=false --allow_custom_ops
Reading your error it seems that the array WeightSharedConvolutionalBoxPredictor_2/BoxPredictionTower/ conv2d_0/BatchNorm /feature_2/FusedBatchNorm_mul_0 from WeightSharedConvolutionalBoxPredictor_2/ Relu6 does not have min/max information which is needed to do post-training quantization.
You can look at Use "dummy-quantization" to try out quantized inference on a float graph. section for an example and some details.
You can add --default_ranges_min=0 --default_ranges_max=255 to your command but you will lose accuracy doing so.
bazel run -c opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--default_ranges_min=0 \
--default_ranges_max=255 \
--change_concat_input_ranges=false \
--allow_custom_ops
From the Tensorflow Converter command line reference :
--default_ranges_min, --default_ranges_max. Type: floating-point. Default value for the (min, max) range values used for all arrays without a specified range. Allows user to proceed with quantization of non-quantized or incorrectly-quantized input files. These flags produce models with low accuracy. They are intended for easy experimentation with quantization via "dummy quantization"

Specified output array "TFlite_Detection_PostProcess" is not produced by any op in this graph" even though it exist in graph

I was following the instruction in this, for converting my ssd_mobilenet_v2_coco model to tflite. I already exported my model to tflite which produced tflite_graph.pb and then when Iconvert it to .tflite file with the command line
tflite_convert --graph_def_file=tflite_graph.pb \
--output_file=detect1.tflite \
--input_shapes=1,300,300,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays=TFlite_Detection_PostProcess \
--change_concat_input_ranges=false \
--allow_custom_ops
It says that
Check failed: GetOpWithOutput(model, output_array) Specified output
array "TFlite_Detection_PostProcess" is not produced by any op in this
graph. Is it a typo? To silence this message, pass this flag:
allow_nonexistent_arrays.
TFLite_Detection_PostProcess really exist in my graph. Here is the image that proves it