How to train TensorFlow's deeplab model on Cityscapes? - tensorflow

Is it possible to train the current deeplab model in TensorFlow to reasonable accuracy using 4 GPUs with 11GB? I seem to be able to fit 2 batches per GPU, so am running a total batch size of 8 across 4 clones.
Following the instructions included with the model, I get a mean IoU of < 30% after 90,000 iterations.
PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim python deeplab/train.py \
--logtostderr --training_number_of_steps=90000 \
--train_split="train" --model_variant="xception_65" \
--atrous_rates=6 --atrous_rates=12 --atrous_rates=18 \
--output_stride=16 --decoder_output_stride=4 --train_crop_size=769 \
--train_crop_size=769 --train_batch_size=8 --num_clones=4 \
--dataset="cityscapes" \
--tf_initial_checkpoint=deeplab/models/xception/model.ckpt \
--train_logdir=$LOGDIR \
--dataset_dir=deeplab/datasets/cityscapes/tfrecord
I have tried with batch norm both enabled and disabled without much difference in outcome.
Thanks!

It seems I needed a much larger step length than the default. 1e-2 gives results closer to the published results, with batch size 15 and a smaller crop window size.

if you check this link https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md
It has links to pretrained models for MobileNet v2 and DeepLab trained on Cityscapes. You can modify the existing shell scripts present here to train on cityscapes.

Related

Deeplab xception for mobile (tensorflow lite)

I am checking the option to run image segmentation using the pre-trained deeplab xception65_coco_voc_trainval model.
The frozen model size is ~161MB, after I convert it to tflite the size is ~160MB, and running this model on my PC cpu takes ~25 seconds.
Is that "expected" or there is something I can do better?
The conversion to tflite is as follow:
tflite_convert \
--graph_def_file="deeplabv3_pascal_trainval/frozen_inference_graph.pb" \
--output_file="deeplab_xception_pascal.tflite" \
--output_format=TFLITE \
--input_shape=1,513,513,3 \
--input_arrays="sub_7" \
--output_arrays="ArgMax" \
--inference_type=FLOAT \
--allow_custom_ops
Thanks!
According to https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md, xception65_coco_voc_trainval with 3 eval scales takes about 223 seconds. The frozen graph has a single eval scale, so ~25 seconds sounds about right to me.
To speed up inference for TfLite I would suggest using gpu delegate, but as you are running on a PC, you will need to find a smaller model. Maybe try one of the mobilenet based models? The edgetpu models will run in tflite without an edgetpu and should be quite fast, although these are trained on cityscapes.

Training my own dataset on deeplab but inference all black

I try to train my own dataset on deeplab model in TensorFlow model garden, I could get a decreasing loss result through time, I using pre-train model provided by official repo.
But when I try to vis with latest checkpoint or try to freeze the model to .pb and do inference, outcome nothing but the black image( I check these images with NumPy all pixels are 0).
My training script like this:
python deeplab/train.py \
--logtostderr \
--num_clones=1 \
--training_number_of_steps=500000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--train_batch_size=2 \
--dataset={$OWN_DATASET} \
--train_logdir={$TRAIN_LOGDIR} \
--dataset_dir={$DATASET_DIR} \
--tf_initial_checkpoint={$INITIAL_CHECKPOINT}
did anyone happen before?
This is an old thread and I don't know if you still need help but you haven't provided much information regarding your dataset. Here are some general pointers:
Try setting these flags in train.py
--fine_tune_batch_norm=False \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
Make sure your SegmentationClassRaw folder label masks are 0, 1, 2, 3... where 0 is the background and 1, 2, 3, ... are individual classes. Run "asarray(image)" to see these pixels and make sure the label is correct.
If you have an unbalanced dataset, you can try setting the weights for different labels in through train.py.
--label_weights=1 \ # Weight for label 0 (Background)
--label_weights=10 \ #Weight for label 1 (Object class 1)
--label_weights=15 \ #Weight for label 2 (Object class 2)
If all fails, try a larger dataset. A dataset size of 225 images and 2000 steps (with an initial mobilenetv2 checkpoint) yielded results for me, although the accuracy/performance was not very good since the dataset size was small. Just as a reference point, the loss of this small dataset was around 0.05-0.06 after 2000 steps.

Accuracy drop for Tensorflow object detection Post Quantization

I am fine-tuning SSD Mobilenet v2 for a custom dataset. I am fine-tuning the model for 50k steps and quantization aware training kicks in at 48k step count.
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
I am observing a 95%+ training, validation and testing mAP post training.
After quantization using the commands
python object_detection/export_tflite_ssd_graph.py
--pipeline_config_path=${CONFIG_FILE}
--trained_checkpoint_prefix=${CHECKPOINT_PATH}
--output_directory=${OUTPUT_DIR} --add_postprocessing_op=true
./bazel-bin/tensorflow/contrib/lite/toco/toco
--input_file=${OUTPUT_DIR}/tflite_graph.pb \
--output_file=${OUTPUT_DIR}/detect.tflite \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--inference_type=QUANTIZED_UINT8 \
--input_shapes="1,300,300,3" \
--input_arrays=normalized_input_image_tensor \
--output_arrays="TFLite_Detection_PostProcess","TFLite_Detection_PostProcess:1","TFLite_Detection_PostProcess:2","TFLite_Detection_PostProcess:3" \
--std_values=128.0 --mean_values=128.0 --allow_custom_ops --default_ranges_min=0 --default_ranges_max=6
I tested the generated detect.tflite model using same test set. I see a drop in mAP to about 85%.
Is this mAP number drop to be expected? How can I improve the post quantization mAP?

Tensorboard eval.py IOU for object detection

I used the ssd_mobilenet_v1_coco from detection model zoo in tensorflow object detection. I am currently training the model by running
python legacy/train.py --logtostderr --train_dir=trainingmobile/ --pipeline_config_path=trainingmobile/pipeline.config
I want to run an evaluation job by running eval.py to get other metrics like IOU and PR Curve but I don't know how to do that. I am able to run the command
python legacy/eval.py \
--logtostderr \
--checkpoint_dir= path/to/checkpoint \
--eval_dir= path/to/eval \
--pipeline_config_path= path/to/config
then I ran the command
tensorboard --logdir=path/to/eval
The tensorboard shows only the test image output. How can i get other metrics like IOU and PR Curve?
First of all, I'd highly recommend you to use the newer model_main.py script for training and evaluation combined. You can use it as shown below:
python object_detection/model_main.py \
--pipeline_config_path=path/to/config \
--model_dir=path/to/train_dir \
--num_train_steps=NUM_TRAIN_STEPS \
--num_eval_steps=NUM_EVAL_STEPS \
--alsologtostderr
It combines training and evaluation and you can enter tensorboard with
tensorboard -logdir=path/to/train_dir
Tensorboard will not only disply the training process, it will also show your progress over your validation set. They use the COCO metric as default metric!
To your original problem: Maybe you should change the eval settings in your config file to larger numbers:
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10}
If you'll use the model_main.py script, the number of evaluation will be set by the flags.
Good to know: The info output of tnesorflow is disabled in the newer model_main.py script. You can enable it by adding
tf.logging.set_verbosity(tf.logging.INFO)
after the import section.

TensorFlow lite: High loss in accuracy after converting model to tflite

I have been trying TFLite to increase detection speed on Android but strangely my .tflite model now almost only detects 1 category.
I have done testing on the .pb model that I got after retraining a mobilenet and the results are good but for some reason, when I convert it to .tflite the detection is way off...
For the retraining I used the retrain.py file from Tensorflow for poets 2
I am using the following commands to retrain, optimize for inference and convert the model to tflite:
python retrain.py \
--image_dir ~/tf_files/tw/ \
--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/1 \
--output_graph ~/new_training_dir/retrainedGraph.pb \
-–saved_model_dir ~/new_training_dir/model/ \
--how_many_training_steps 500
sudo toco \
--input_file=retrainedGraph.pb \
--output_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TENSORFLOW_GRAPHDEF \
--input_shape=1,224,224,3 \
--input_array=Placeholder \
--output_array=final_result \
sudo toco \
--input_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=retrainedGraph.tflite \
--inference_type=FLOAT \
--inference_input_type=FLOAT \
--input_arrays=Placeholder \
--output_array=final_result \
--input_shapes=1,224,224,3
Am I doing anything wrong here? Where could the loss in accuracy come from?
I faced the same issue while I was trying to convert a .pb model into .lite.
In fact, my accuracy would come down from 95 to 30!
Turns out the mistake I was committing was not during the conversion of .pb to .lite or in the command involved to do so. But it was actually while loading the image and pre-processing it before it is passed into the lite model and inferred using
interpreter.invoke()
command.
The below code you see is what I meant by pre-processing:
test_image=cv2.imread(file_name)
test_image=cv2.resize(test_image,(299,299),cv2.INTER_AREA)
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
interpreter.set_tensor(input_tensor_index, test_image)
interpreter.invoke()
digit = np.argmax(output()[0])
#print(digit)
prediction=result[digit]
As you can see there are two crucial commands/pre-processing done on the image once it is read using "imread()":
i) The image should be resized to the size that is the "input_height" and "input_width" values of the input image/tensor that was used during the training. In my case (inception-v3) this was 299 for both "input_height" and "input_width". (Read the documentation of the model for this value or look for this variable in the file that you used to train or retrain the model)
ii) The next command in the above code is:
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
I got this from the "formulae"/model code:
test_image = np.expand_dims((test_image-input_mean)/input_std, axis=0).astype(np.float32)
Reading the documentation revealed that for my architecture input_mean = 0 and input_std = 255.
When I did the said changes to my code, I got the accuracy that was expected (90%).
Hope this helps.
Please file an issue on GitHub https://github.com/tensorflow/tensorflow/issues and add the link here.
Also please add more details on what you are retraining the last layer for.