I try to train my own dataset on deeplab model in TensorFlow model garden, I could get a decreasing loss result through time, I using pre-train model provided by official repo.
But when I try to vis with latest checkpoint or try to freeze the model to .pb and do inference, outcome nothing but the black image( I check these images with NumPy all pixels are 0).
My training script like this:
python deeplab/train.py \
--logtostderr \
--num_clones=1 \
--training_number_of_steps=500000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--train_batch_size=2 \
--dataset={$OWN_DATASET} \
--train_logdir={$TRAIN_LOGDIR} \
--dataset_dir={$DATASET_DIR} \
--tf_initial_checkpoint={$INITIAL_CHECKPOINT}
did anyone happen before?
This is an old thread and I don't know if you still need help but you haven't provided much information regarding your dataset. Here are some general pointers:
Try setting these flags in train.py
--fine_tune_batch_norm=False \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
Make sure your SegmentationClassRaw folder label masks are 0, 1, 2, 3... where 0 is the background and 1, 2, 3, ... are individual classes. Run "asarray(image)" to see these pixels and make sure the label is correct.
If you have an unbalanced dataset, you can try setting the weights for different labels in through train.py.
--label_weights=1 \ # Weight for label 0 (Background)
--label_weights=10 \ #Weight for label 1 (Object class 1)
--label_weights=15 \ #Weight for label 2 (Object class 2)
If all fails, try a larger dataset. A dataset size of 225 images and 2000 steps (with an initial mobilenetv2 checkpoint) yielded results for me, although the accuracy/performance was not very good since the dataset size was small. Just as a reference point, the loss of this small dataset was around 0.05-0.06 after 2000 steps.
Related
I would like to employ EfficientNet Lite 0 model as a backbone to perform a keypoint regression task. However, I get stuck at loading the model from the either Tensorflow Hub or the official GitHub repository. Could you please explain how can I:
import such model in Tensorflow with checkpoints from ImageNet
modify the last layers of the network
modify the loss according to my task
retrain the network
I am looking forward to apply Efficient Lite since I would like to convert everything to TF Lite.
TensorFlow Lite currently doesn't support EfficientNet Lite, but they do support mobile (CPU & GPU) friendly CenterNet. See this Colab that demonstrates how to use this model.
Commands to convert the keypoints model:
# Get mobile-friendly CenterNet for Keypoint detection task.
# See TensorFlow 2 Detection Model Zoo for more details:
# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
wget http://download.tensorflow.org/models/object_detection/tf2/20210210/centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz
tar -xf centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz
rm centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz*
# Export the intermediate SavedModel that outputs 10 detections & takes in an
# image of dim 320x320.
# Modify these parameters according to your needs.
python models/research/object_detection/export_tflite_graph_tf2.py \
--pipeline_config_path=centernet_mobilenetv2_fpn_kpts/pipeline.config \
--trained_checkpoint_dir=centernet_mobilenetv2_fpn_kpts/checkpoint \
--output_directory=centernet_mobilenetv2_fpn_kpts/tflite \
--centernet_include_keypoints=true \
--keypoint_label_map_path=centernet_mobilenetv2_fpn_kpts/label_map.txt \
--max_detections=10 \
--config_override=" \
model{ \
center_net { \
image_resizer { \
fixed_shape_resizer { \
height: 320 \
width: 320 \
} \
} \
} \
}"
tflite_convert --output_file=centernet_mobilenetv2_fpn_kpts/model.tflite \
--saved_model_dir=centernet_mobilenetv2_fpn_kpts/tflite/saved_model
I am training an object detection model with Azure customvision.ai. The model output is with tensorflow, either saved model .pb, .tf or .tflite.
The model output type is designated as float32[1,13,13,50]
I then push the .tflite onto a Google Coral Edge device and attempt to run it (previous .tflite models trained with Google Cloud worked, but I'm now bound to corporate Azure and need to use customvision.ai). These commands are with
$ mdt shell
$ export DEMO_FILES="/usr/lib/python3/dist*/edgetpu/demo"
$ export DISPLAY=:0 && edgetpu_detect \
$ --source /dev/video1:YUY2:1280x720:20/1 \
$ --model ${DEMO_FILES}/model.tflite
Finally, the model attempts to run, but results in a ValueError
'This model has a {}.'.format(output_tensors_sizes.size)))
ValueError: Detection model should have 4 output tensors! This model has 1.
What is happening here? How do I reshape my tensorflow model to match the device requirements of 4 output tensors?
The model that works
The model that does not work
Edit, this outputs a tflite model, but still has only one output
python tflite_convert.py \
--output_file=model.tflite \
--graph_def_file=saved_model.pb \
--saved_model_dir="C:\Users\b0588718\AppData\Roaming\Python\Python37\site-packages\tensorflow\lite\python" \
--inference_type=FLOAT \
--input_shapes=1,416,416,3 \
--input_arrays=Placeholder \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--mean_values=128 \
--std_dev_values=128 \
--allow_custom_ops \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true
You are running an object detection demo where the engine expects 4 outputs from the model and your model only have one outputs. Maybe you had the tflite conversion incorrect? For instance, if you grabbed the Face SSD model from our zoo, conversion should be like this:
$ tflite_convert \
--output_file=face_ssd.tflite \
--graph_def_file=tflite_graph.pb \
--inference_type=QUANTIZED_UINT8 \
--input_shapes=1,320,320,3 \
--input_arrays normalized_input_image_tensor \
--output_arrays "TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3" \
--mean_values 128 \
--std_dev_values 128 \
--allow_custom_ops \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true
Take a look at a similar query for more details:
https://github.com/google-coral/edgetpu/issues/135#issuecomment-640677917
I am using TF1.12 to convert a simple mnist model with two conv2d layers to TF-lite.
F32:
`*tflite_convert --output_file model_lite/conv_net_f32.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE*`
UINT8:
tflite_convert --output_file model_lite/conv_net_uint8.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE \
--mean_values 0\
--std_dev_values 255\
--default_ranges_min 0 \
--default_ranges_max 255 \
--inference_type QUANTIZED_UINT8 \
--inference_input_type QUANTIZED_UINT8*
However I found the execution time of quantized uint8 version is slower than f32.
Results:
It does not make sense to me.
Does anyone know the reason?
Thanks for any inputs!
I think you should try conversion with following command:
tflite_convert --output_file model_lite/conv_net_uint8.tflite \
--graph_def_file frozen_graphs/conv_net.pb \
--input_arrays "input" \
--input_shapes "1,784" \
--output_arrays output \
--output_format TFLITE \
--mean_values 128\
--std_dev_values 127
--default_ranges_min 0 \
--default_ranges_max 1 \
--inference_type QUANTIZED_UINT8 \
--inference_input_type QUANTIZED_UINT8*
default_ranges_min and default_ranges_max correspond to maximum and minimum values of your network, as in the range of your activation functions.
If you are using activation such as relu6, you should change default_ranges_max to 6.
See this Stack Overflow question for information about mean_values and std_dev_values. They depend on your training data.
Unless the hardware has special support for fast 8 bit instructions, quantized models are not expected to be any faster than FP32 models
Eg . tflite uint8 models runs with same speed as that of fp32 on raspi aarch64 as it is arm7 device
I have been trying TFLite to increase detection speed on Android but strangely my .tflite model now almost only detects 1 category.
I have done testing on the .pb model that I got after retraining a mobilenet and the results are good but for some reason, when I convert it to .tflite the detection is way off...
For the retraining I used the retrain.py file from Tensorflow for poets 2
I am using the following commands to retrain, optimize for inference and convert the model to tflite:
python retrain.py \
--image_dir ~/tf_files/tw/ \
--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/1 \
--output_graph ~/new_training_dir/retrainedGraph.pb \
-–saved_model_dir ~/new_training_dir/model/ \
--how_many_training_steps 500
sudo toco \
--input_file=retrainedGraph.pb \
--output_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TENSORFLOW_GRAPHDEF \
--input_shape=1,224,224,3 \
--input_array=Placeholder \
--output_array=final_result \
sudo toco \
--input_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=retrainedGraph.tflite \
--inference_type=FLOAT \
--inference_input_type=FLOAT \
--input_arrays=Placeholder \
--output_array=final_result \
--input_shapes=1,224,224,3
Am I doing anything wrong here? Where could the loss in accuracy come from?
I faced the same issue while I was trying to convert a .pb model into .lite.
In fact, my accuracy would come down from 95 to 30!
Turns out the mistake I was committing was not during the conversion of .pb to .lite or in the command involved to do so. But it was actually while loading the image and pre-processing it before it is passed into the lite model and inferred using
interpreter.invoke()
command.
The below code you see is what I meant by pre-processing:
test_image=cv2.imread(file_name)
test_image=cv2.resize(test_image,(299,299),cv2.INTER_AREA)
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
interpreter.set_tensor(input_tensor_index, test_image)
interpreter.invoke()
digit = np.argmax(output()[0])
#print(digit)
prediction=result[digit]
As you can see there are two crucial commands/pre-processing done on the image once it is read using "imread()":
i) The image should be resized to the size that is the "input_height" and "input_width" values of the input image/tensor that was used during the training. In my case (inception-v3) this was 299 for both "input_height" and "input_width". (Read the documentation of the model for this value or look for this variable in the file that you used to train or retrain the model)
ii) The next command in the above code is:
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
I got this from the "formulae"/model code:
test_image = np.expand_dims((test_image-input_mean)/input_std, axis=0).astype(np.float32)
Reading the documentation revealed that for my architecture input_mean = 0 and input_std = 255.
When I did the said changes to my code, I got the accuracy that was expected (90%).
Hope this helps.
Please file an issue on GitHub https://github.com/tensorflow/tensorflow/issues and add the link here.
Also please add more details on what you are retraining the last layer for.
Is it possible to train the current deeplab model in TensorFlow to reasonable accuracy using 4 GPUs with 11GB? I seem to be able to fit 2 batches per GPU, so am running a total batch size of 8 across 4 clones.
Following the instructions included with the model, I get a mean IoU of < 30% after 90,000 iterations.
PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim python deeplab/train.py \
--logtostderr --training_number_of_steps=90000 \
--train_split="train" --model_variant="xception_65" \
--atrous_rates=6 --atrous_rates=12 --atrous_rates=18 \
--output_stride=16 --decoder_output_stride=4 --train_crop_size=769 \
--train_crop_size=769 --train_batch_size=8 --num_clones=4 \
--dataset="cityscapes" \
--tf_initial_checkpoint=deeplab/models/xception/model.ckpt \
--train_logdir=$LOGDIR \
--dataset_dir=deeplab/datasets/cityscapes/tfrecord
I have tried with batch norm both enabled and disabled without much difference in outcome.
Thanks!
It seems I needed a much larger step length than the default. 1e-2 gives results closer to the published results, with batch size 15 and a smaller crop window size.
if you check this link https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md
It has links to pretrained models for MobileNet v2 and DeepLab trained on Cityscapes. You can modify the existing shell scripts present here to train on cityscapes.