I used the ssd_mobilenet_v1_coco from detection model zoo in tensorflow object detection. I am currently training the model by running
python legacy/train.py --logtostderr --train_dir=trainingmobile/ --pipeline_config_path=trainingmobile/pipeline.config
I want to run an evaluation job by running eval.py to get other metrics like IOU and PR Curve but I don't know how to do that. I am able to run the command
python legacy/eval.py \
--logtostderr \
--checkpoint_dir= path/to/checkpoint \
--eval_dir= path/to/eval \
--pipeline_config_path= path/to/config
then I ran the command
tensorboard --logdir=path/to/eval
The tensorboard shows only the test image output. How can i get other metrics like IOU and PR Curve?
First of all, I'd highly recommend you to use the newer model_main.py script for training and evaluation combined. You can use it as shown below:
python object_detection/model_main.py \
--pipeline_config_path=path/to/config \
--model_dir=path/to/train_dir \
--num_train_steps=NUM_TRAIN_STEPS \
--num_eval_steps=NUM_EVAL_STEPS \
--alsologtostderr
It combines training and evaluation and you can enter tensorboard with
tensorboard -logdir=path/to/train_dir
Tensorboard will not only disply the training process, it will also show your progress over your validation set. They use the COCO metric as default metric!
To your original problem: Maybe you should change the eval settings in your config file to larger numbers:
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10}
If you'll use the model_main.py script, the number of evaluation will be set by the flags.
Good to know: The info output of tnesorflow is disabled in the newer model_main.py script. You can enable it by adding
tf.logging.set_verbosity(tf.logging.INFO)
after the import section.
Related
I have trained an object-detection model in Tensorflow 2 with EfficientDet. And now I am trying to evaluate the model performance on test dataset. I ran below command to evaluate model -
python model_main_tf2.py \
--model_dir=path/to/model_dir \
--pipeline_config_path=path/to/model_dir/pipeline.config \
--checkpoint_dir=path/to/checkpoint_dir
And I can see below results -
Here I can see Average-Precision for IoU=0.50 (second line) and IoU=0.75 (third line).
Now, is there any way that I can see Average-Precision for IoU=0.8 and IoU=0.9 too?
Also, can I see Average-Recall for IoU=0.8 and IoU=0.9 too?
I am trying to do the evaluation of a trained SSD_Mobilenetv2 320x320 fpnlite on tensorflow. I ran training and evaluation parallel in two different colabs account. But I am always getting the metrics values as -1(evaluation result) after each checkpoint. The lose is also increasing after evaluating each checkpoint.
Below is the command used to run the evaluation:
!python /model_main_tf2.py \
--pipeline_config_path /pipeline.config \
--model_dir /model_ \
--checkpoint_dir /model_
I am training MobileNet_v1_1.0_224 using TensorFlow. I am using the python scripts present in the TensorFlow-Slim image classification model library for training. My dataset distribution with 4 classes is as follows:
normal_faces: 42070
oncall_faces: 13563 (People faces with mobile in the image when they're on call)
smoking_faces: 5949
yawning_faces: 1630
All images in the dataset are square images and larger than 224x224
I am using train_image_classifier.py to train the model with following arguments,
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=custom \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=mobilenet_v1 \
--batch_size=32\
--max_number_of_steps=25000
After training the model, eval_image_classifier.py shows an accuracy greater than 95% on Validation set but when I exported the frozen graph and used it for predictions, it performs very poorly.
I have also tried this notebook but this also produced similar results.
Log: Training Log
Plots: Loss and Accuracy
What is the reason for this? How do I fix this issue?
I have seen similar issues on SO but nothing related to MobileNets specifically.
Did you use a validation set? If so what was the validation accuracy?
If you used a validation set a good way to check if you are doing predictions properly is to run model.predict on the validation set.
I downloaded the vgg_19_2016_08_28.tar.gz and extracted a vgg-19.pb graph. I am using this for tf2onnx. However, this seems to have some dynamic parameters and hence tf2onnx if failing. I want to check if the vgg-19.pb is a frozen graph, if not how can I get a frozen vgg_19.pb graph?
Same question for tensorflow_inception_graph - inception_v3_2016_08_28.tar.gz
Same question for resnet - resnet_v1_50_2016_08_28.tar.gz
All downloaded from here - https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models
To convert TF models to ONNX you need to freeze the graph. The TensorFlow tool to freeze the graph is https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
For example
python -m tensorflow.python.tools.freeze_graph \
--input_graph=my_checkpoint_dir/graphdef.pb \
--input_binary=true \
--output_node_names=output \
--input_checkpoint=my_checkpoint_dir \
--output_graph=tests/models/fc-layers/frozen.pb
To find the inputs and outputs for the TensorFlow graph the model developer will know or you can consult TensorFlow's summarize_graph tool (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms), for example:
summarize_graph --in_graph=tests/models/fc-layers/frozen.pb
I am trying to convert a fine-tuned tensorflow inception_v3 model to uff format which can be run on NVIDIA's Jetson TX2. For conversion to uff, certain ops are supported, some are not. I am able to successfully freeze and convert to uff inception_v3 model with imagenet checkpoint provided by tensorflow. However if I fine-tune the model, additional ops like Floor, RandomUniform, etc are added in the new graph which are not yet supported. These layers remain even after freezing the model. This is happening in the fine-tuning for flowers sample provided on tensorflow site as well.
I want to understand why additional ops are added in the graph, while fine-tuning is just supposed to modify the final layer to match number of outputs required.
If they are added while training, how can I get rid of them? What post-processing steps tensorflow team followed before releasing inception_v3 model for imagenet?
I can share the pbtxt files if needed. For now, model layers details are uploaded at https://github.com/shrutim90/TF_to_UFF_Issue. I am using Tensorflow 1.6 with GPU.
I am following the steps to freeze or fine-tune the model from: https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. As described in the above link, to reproduce the issue, install TF-Slim image models library and follow these steps:
1. python export_inference_graph.py \
--alsologtostderr \
--model_name=inception_v3 \
--output_file=/tmp/inception_v3_inf_graph.pb
2. python freeze_graph.py \
--input_graph=/tmp/inception_v3_inf_graph.pb \
--input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \
--input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
3. DATASET_DIR=/tmp/flowers
TRAIN_DIR=/tmp/flowers-models/inception_v3
CHECKPOINT_PATH=/tmp/my_checkpoints/inception_v3.ckpt
python train_image_classifier.py --train_dir=$TRAIN_DIR --dataset_dir=$DATASET_DIR --dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=${CHECKPOINT_PATH} --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
4. python freeze_graph.py \
--input_graph=/tmp/graph.pbtxt \
--input_checkpoint=/tmp/checkpoints/model.ckpt-2539 \
--input_binary=false --output_graph=/tmp/frozen_inception_v3_flowers.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
To check the layers, you can check out .pbtxt file or use NVIDIA's convert-to-uff utility.
Run training script -> export_inference_graph -> freeze_graph . This gets rid of all the extra nodes and model can be easily converted to uff.